Multiscale Cross-modal Homogeneity Enhancement and Confidence-aware Fusion for Multispectral Pedestrian Detection
Ruimin Li,Jiajun Xiang,Feixiang Sun,Ye Yuan,Longwu Yuan,Shuiping Gou
DOI: https://doi.org/10.1109/tmm.2023.3272471
IF: 7.3
2023-01-01
IEEE Transactions on Multimedia
Abstract:Multispectral pedestrian detection has shown many advantages in a variety of environments, particularly poor illumination conditions, by leveraging visible-thermal modalities. However, in-depth insight into distinguishing the complementary content of multimodal data and exploring the extent of multimodal feature fusion is still lacking. In this paper, we propose a novel multispectral pedestrian detector with multiscale cross-modal homogeneity enhancement and confidence-aware feature fusion. RGB and thermal streams are constructed to extract features and generate candidate proposals. During feature extraction, multiscale cross-modal homogeneity enhancement is proposed to enhance single-modal features using the separated homogeneous features via modal interactions. Homogeneity features encode the semantic information of the scene and are extracted from the RGB-thermal pairs by employing a channel attention mechanism. Proposals from two modalities are united to obtain multimodal proposals. Then, confidence measurement fusion is proposed to achieve multispectral feature fusion in each proposal by measuring the internal confidence of each modality and the interaction confidence between modalities. In addition, a confidence transfer loss function is designed to focus more on hard-to-detect samples during training. Experimental results on two challenging datasets demonstrate that the proposed method achieves better performance compared to existing methods.
computer science, information systems,telecommunications, software engineering