Improving RGB-infrared object detection with cascade alignment-guided transformer
Maoxun Yuan,Xiaorong Shi,Nan Wang,Yinyan Wang,Xingxing Wei
DOI: https://doi.org/10.1016/j.inffus.2024.102246
IF: 18.6
2024-01-10
Information Fusion
Abstract:The integration of multispectral data in object detection, especially visible and infrared images, has been the subject of considerable attention recently. Complementary information from visible (RGB) and infrared (IR) images can ameliorate the challenges posed by variable lighting conditions, rendering them an invaluable resource in many fields, including RGB-IR object detection, RGB-IR semantic segmentation, and RGB-IR crowd counting. However, existing methods still suffer from weak misalignment and fusion imprecision problems. These two problems present significant challenges for accurate detection. In this paper, our primary focus is to solve the above problems in RGB-IR object detection tasks. Specifically, we first propose a Translation-Scale-Rotation Alignment (TSRA) module to align two modality features from region proposals. Base on the aligned region features, we introduce a Complementary Fusion Transformer (CFT) module to capture the complementary features. These two modules can be coupled in an unified Region of Interest (RoI) detection head called Cascade Alignment-Guided Transformer (CAGT) to obtain the final robust fused features. Finally, based on CAGT, a region feature alignment and fusion detector called CAGTDet is constructed for RGB-IR object detection. Through comprehensive experiments on the aerial DroneVehicle dataset, our method effectively mitigates the impact of these two issues, resulting in robust detection results. Moreover, to evaluate the generalization of our method, we also perform experiments on the nature images sampled from the KAIST multispectral pedestrian benchmark. The results show that our method surpasses other state of the art methods.
computer science, artificial intelligence, theory & methods