Multi-window Transformer Parallel Fusion Feature Pyramid Network for Pedestrian Orientation Detection

Xiao Li,Shexiang Ma,Liqing Shan
DOI: https://doi.org/10.1007/s00530-022-00993-9
IF: 3.9
2022-01-01
Multimedia Systems
Abstract:In complex traffic scenes, the orientation and location of pedestrians are important criteria for judging their intentions. We note that pedestrians are characterized by variability in appearance and small differences among orientations (especially adjacent orientations), thus causing general object detection algorithms to perform poorly in extracting features. So, extracting more discriminative features is an effective way to solve this problem. To this end, we propose a novel framework to enhance feature extraction involving pedestrian orientation detection (orientation classification and location regression). The framework consists of two modules, multi-window Transformer parallel fusion feature pyramid (MTPF) and gated graph (GG). The MTPF module is used for multi-layer feature fusion, which improves the feature representation of the prediction map by extracting high-level semantic information from deep layers and recovering missing contextual information from shallow layers. Specifically, it is achieved by setting a sliding window on multiple prediction maps and fused by the Transformer. The region proposal is abstracted into a graph with six nodes in the GG module, where each node represents a body part. We utilize GG to learn the spatial dependencies among body parts and learn features by aggregating information from neighbors. Finally, pedestrian orientation classification and location regression are performed on a graph containing rich relationships among nodes. According to the survey, there are currently no methods and datasets that can be directly used for pedestrian orientation detection, so we manually annotate pedestrian orientations on three public datasets containing a large number of pedestrian samples, and compare the proposed method with the current state-of-the-art object detection methods by comparison, the results demonstrate the effectiveness of the proposed method.
What problem does this paper attempt to address?