Misaligned Visible-Thermal Object Detection: A Drone-based Benchmark and Baseline

Kechen Song,Xiaotong Xue,Hongwei Wen,Yingying Ji,Yunhui Yan,Qinggang Meng
DOI: https://doi.org/10.1109/TIV.2024.3398429
IF: 8.2
2024-01-01
IEEE Transactions on Intelligent Vehicles
Abstract:Multispectral object detection has achieved remarkable results due to its ability to fuse information from visible and thermal modalities in recent years. However, the existing visible-thermal datasets are constructed based on manually aligned image pairs, which cannot fully represent the challenges of real-world scenarios where image pairs are often misaligned. Existing methods for visible-thermal object detection are based on aligned data and are limited by the accuracy of registration. To address the above issues, we propose a dataset, namely DVTOD, which is a misaligned visible-thermal object detection dataset captured by drones. DVTOD includes 16 challenging attributes and 54 capture scenes. Furthermore, we introduce a cross-modal alignment detector (CMA-Det) for misaligned visible-thermal object detection. Firstly, we design an alignment network to estimate the visible-to-thermal deformation field, which is used to correct for misalignment of the corresponding visible and thermal features. Secondly, we propose a strategy called Object Search Rectification (OSR) to improve the robustness of feature alignment. To better remove the interference of complex backgrounds, a bi-directional feature correction fusion module (BFCFM) is designed to calibrate bimodal features by exploiting the correlation of channel and spatial information between two modalities. CMA-Det outperforms existing methods on the DVTOD dataset and two other visible-thermal object detection datasets. The dataset and code will be published at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/VDT-2048/DVTOD</uri> .
What problem does this paper attempt to address?