GM-DETR: Generalized Muiltispectral DEtection TRansformer with Efficient Fusion Encoder for Visible-Infrared Detection

Yiming Xiao,Fanman Meng,Qingbo Wu,Linfeng Xu,Mingzhou He,Hongliang Li
DOI: https://doi.org/10.1109/cvprw63382.2024.00563
2024-01-01
Computer Vision and Pattern Recognition
Abstract:Multispectral object detection based on RGB and IR achieves improved accurate and robust performance by integrating complementary information from different modalities. However, existing methods predominantly focus on effectively fusing information from both modalities to enhance detection performance, and rarely study the diversified utilization of RGB and IR data and explore the adaptability of the model to practical application scenarios. We first analyze the utilization of datasets for multispectral object detection, and compare their testing performance. To better leverage datasets and address more generalized model application scenarios, we propose a Generalized Multispectral DEtection TRansformer (GM-DETR) with a two-stage training strategy. Specifically, we design the Modality-Specific Feature Interaction (MSFI) module to extract the high-level information from RGB and IR, and propose the Cross-Modality-Scale feature Fusion (CMSF) module for fusing RGB and IR modalities, which performs multi-scale cross-modalities fusion. Our GM-DETR achieves state-of-the-art performance on FLIR and LLVIP benchmark datasets.
What problem does this paper attempt to address?