Illumination-Guided Transformer-Based Network for Multispectral Pedestrian Detection

Fuchen Chu,Jiale Cao,Zhuang Shao,Yanwei Pang
DOI: https://doi.org/10.1007/978-3-031-20497-5_28
2022-01-01
Abstract:Multi-modal information (e.g., visible and thermal) can generate reliable and robust pedestrian detection results in various computer vision applications. Despite its broad applications, it remains a crucial problem that how to fuse the two modalities effectively. The self-attention operator of transformer can obtain long-range dependencies and integrate information across the entire input, which has been widely used for cross-modal fusion. However, there is still a lack of further analysis and design for transformer to use in multispectral pedestrian detection task. To benefit from both RGB and thermal modalities, we propose a novel illumination-guided transformer-based network (ITNet) for multispectral pedestrian detection in this paper. Firstly, different from the previous methods that apply the original transformer structure directly, we designed two different transformer-based fusion modules to make the RGB and thermal modalities complement each other. Secondly, an illumination-guided module is used to adaptively re-weight and fuse the multi-modal features according to the illumination conditions. Extensive evaluations on two benchmarks demonstrate the effectiveness of our proposed approach for multispectral pedestrian detection.
What problem does this paper attempt to address?