Adaptive multimodal feature fusion with frequency domain gate for remote sensing object detection

Xu SunYinhui YuQing ChengSchool of Communication Engineering,Jilin University,Changchun,China
DOI: https://doi.org/10.1080/2150704x.2024.2305177
IF: 2.369
2024-01-19
Remote Sensing Letters
Abstract:Fusing complementary information of visible and infrared radiation modalities can improve object detection performance for unmanned aerial vehicle (UAV) remote sensing images under insufficient illumination conditions. Although previous works have conducted some studies in this field, they have rarely considered the adaptive ability of multimodal feature fusion, which limits the performance improvement space for multimodal detectors. To this end, we propose an adaptive multimodal feature fusion method with a frequency domain gate based on DINO (detection transformer with improved denoising anchor boxes), called multimodal DINO. In our approach, a multimodal feature encoder with underlying feature sharing is designed, which efficiently extracts common and differential features through RGB-guided infrared radiation data transformation. Additionally, an adaptive frequency domain gate is introduced to dynamically learn the degree of dependence on frequency-filtered features of each modality when processing different samples. We evaluate the proposed method on the two multimodal detection remote sensing image datasets, VEDAI and DroneVehicle. Extensive experiments demonstrate that our approach achieves superior performance compared to basic detectors and existing multimodal detection methods. Our code is available at https://github.com/cq100/multimodalDINO.
imaging science & photographic technology,remote sensing
What problem does this paper attempt to address?