Abstract:Due to the complementarity of multispectral data, the performance of pedestrian detection can be significantly improved, so multispectral pedestrian detection has received great attention from the research community. However, existing pedestrian detection algorithms still suffer from some problems, such as insufficient information exchange between the two streams, and lack of targeted network design for the characteristics of the image source. In practical application scenarios, different targeted network models are generally used during the day and night, and the day model and night model can be simply switched during the deduction process. Therefore, we propose two subnetworks FTHd (Fusion Transformer Histogram day) and FTn (Fusion Transformer night) for the characteristics of daytime and nighttime images. The texture features of RGB images during the day are more obvious. We first add a histogram layer to the input branch of the detection network. After that, we added the cross-modal feature fusion method CFT (Cross-Modal Fusion Transformer) module to fuse and interact features. By leveraging the Transformer’s self-attention, the network can naturally perform intra-modal and inter-modal fusion. The light at night is very weak, and thermal images play a key role. Since the texture information is weak, complex network structures are not required, and we combine the two streams into one stream to reduce the amount of computation. Finally, we add a CFT module to fuse and interact features. Compared with baseline methods, the proposed FTHd and FTn achieve improved pedestrian detection accuracy.

Semantically Enhanced Multi-scale Feature Pyramid Fusion for Pedestrian Detection.

Feature Fusing of Feature Pyramid Network for Multi-Scale Pedestrian Detection

Pedestrian Detection Based on Multi-Scale Fusion Features

Multi‐scale Pedestrian Detection Based on Self‐attention and Adaptively Spatial Feature Fusion

Small-scale Pedestrian Detection Based on Multi-Level Feature Fusion

Multi-Scale Structure Perception and Global Context-Aware Method for Small-Scale Pedestrian Detection

Improving Multispectral Pedestrian Detection with Scale‐aware Permutation Attention and Adjacent Feature Aggregation

Delving Deep into Multiscale Pedestrian Detection Via Single Scale Feature Maps

Multi-scale Feature Balance Enhancement Network for Pedestrian Detection

A Part-Aware Multi-Scale Fully Convolutional Network for Pedestrian Detection

Locality guided cross-modal feature aggregation and pixel-level fusion for multispectral pedestrian detection

PRF-Ped: Multi-scale Pedestrian Detector with Prior-based Receptive Field.

Multi-Scale Feature Pyramid Network: A Heavily Occluded Pedestrian Detection Network Based on ResNet

Pedestrian Detection Using Multi-Channel Visual Feature Fusion by Learning Deep Quality Model.

Deep Feature Fusion by Competitive Attention for Pedestrian Detection

PFEL-Net: A lightweight network to enhance feature for multi-scale pedestrian detection

Pedestrian Detection Algorithm Based on Multi-Scale Feature Extraction and Attention Feature Fusion

Multi-Grained Deep Feature Learning for Pedestrian Detection

Transformer fusion and histogram layer multispectral pedestrian detection network

Pedestrian detection based on channel feature fusion and enhanced semantic segmentation

Multi‐scale Pedestrian Detection with Global–local Attention and Multi‐scale Receptive Field Context