Transformer fusion and histogram layer multispectral pedestrian detection network
Ying Zang,Chenglong Fu,Dongsheng Yang,Hui Li,Chaotao Ding,Qingshan Liu
DOI: https://doi.org/10.1007/s11760-023-02579-y
2023-05-05
Abstract:Due to the complementarity of multispectral data, the performance of pedestrian detection can be significantly improved, so multispectral pedestrian detection has received great attention from the research community. However, existing pedestrian detection algorithms still suffer from some problems, such as insufficient information exchange between the two streams, and lack of targeted network design for the characteristics of the image source. In practical application scenarios, different targeted network models are generally used during the day and night, and the day model and night model can be simply switched during the deduction process. Therefore, we propose two subnetworks FTHd (Fusion Transformer Histogram day) and FTn (Fusion Transformer night) for the characteristics of daytime and nighttime images. The texture features of RGB images during the day are more obvious. We first add a histogram layer to the input branch of the detection network. After that, we added the cross-modal feature fusion method CFT (Cross-Modal Fusion Transformer) module to fuse and interact features. By leveraging the Transformer’s self-attention, the network can naturally perform intra-modal and inter-modal fusion. The light at night is very weak, and thermal images play a key role. Since the texture information is weak, complex network structures are not required, and we combine the two streams into one stream to reduce the amount of computation. Finally, we add a CFT module to fuse and interact features. Compared with baseline methods, the proposed FTHd and FTn achieve improved pedestrian detection accuracy.
engineering, electrical & electronic,imaging science & photographic technology