Abstract:Multispectral pedestrian detection is a technology designed to detect and locate pedestrians in Color and Thermal images, which has been widely used in automatic driving, video surveillance, etc. So far most available multispectral pedestrian detection algorithms only achieved limited success in pedestrian detection because of the lacking take into account the confusion of pedestrian information and background noise in Color and Thermal images. Here we propose a multispectral pedestrian detection algorithm, which mainly consists of a cascaded information enhancement module and a cross-modal attention feature fusion module. On the one hand, the cascaded information enhancement module adopts the channel and spatial attention mechanism to perform attention weighting on the features fused by the cascaded feature fusion block. Moreover, it multiplies the single-modal features with the attention weight element by element to enhance the pedestrian features in the single-modal and thus suppress the interference from the background. On the other hand, the cross-modal attention feature fusion module mines the features of both Color and Thermal modalities to complement each other, then the global features are constructed by adding the cross-modal complemented features element by element, which are attentionally weighted to achieve the effective fusion of the two modal features. Finally, the fused features are input into the detection head to detect and locate pedestrians. Extensive experiments have been performed on two improved versions of annotations (sanitized annotations and paired annotations) of the public dataset KAIST. The experimental results show that our method demonstrates a lower pedestrian miss rate and more accurate pedestrian detection boxes compared to the comparison method. Additionally, the ablation experiment also proved the effectiveness of each module designed in this paper.

MFMANet: a multispectral pedestrian detection network using multi-resolution RGB feature reuse with multi-scale FIR attentions

See Extensively While Focusing on the Core Area for Pedestrian Detection.

Locality guided cross-modal feature aggregation and pixel-level fusion for multispectral pedestrian detection

Transformer fusion and histogram layer multispectral pedestrian detection network

Lightweight Cross-Modal Multispectral Pedestrian Detection Based on Spatial Reweighted Attention Mechanism

Multispectral Pedestrian Detection via Simultaneous Detection and Segmentation

Improving Multispectral Pedestrian Detection by Addressing Modality Imbalance Problems

M2FNet: Mask-guided Multi-level Fusion for RGB-T Pedestrian Detection

Improving Multispectral Pedestrian Detection with Scale‐aware Permutation Attention and Adjacent Feature Aggregation

Multispectral pedestrian detection based on feature complementation and enhancement

Multispectral Deep Neural Networks for Pedestrian Detection

Cascaded information enhancement and cross-modal attention feature fusion for multispectral pedestrian detection

INSANet: INtra-INter Spectral Attention Network for Effective Feature Fusion of Multispectral Pedestrian Detection

Deep saliency detection-based pedestrian detection with multispectral multi-scale features fusion network

Pedestrian detection with unsupervised multispectral feature learning using deep neural networks

A Fast RetinaNet Fusion Framework for Multi-Spectral Pedestrian Detection

Multi-scale cross-layer fusion and center position network for pedestrian detection

Attention-Guided Sample-Based Feature Enhancement Network for Crowded Pedestrian Detection Using Vision Sensors

Fusion of Multispectral Data Through Illumination-aware Deep Neural Networks for Pedestrian Detection

PFEL-Net: A lightweight network to enhance feature for multi-scale pedestrian detection

Learning a Dynamic Cross-Modal Network for Multispectral Pedestrian Detection