Visible–Infrared Dual-Sensor Tracking Based on Transformer via Progressive Feature Enhancement and Fusion

Yangliu Kuai,Dongdong Li,Zhinan Gao,Mingwei Yuan,Da Zhang
DOI: https://doi.org/10.1109/JSEN.2024.3372991
IF: 4.3
2024-05-01
IEEE Sensors Journal
Abstract:This article investigates how to implement accurate RGB-T tracking by achieving effective feature enhancement of the target and adaptive fusion of the complementary information in RGB and thermal infrared modalities. Inspired by the excellent long-range dependency modeling ability of the transformer, we propose a novel RGBT tracking method based on the transformer via progressive feature enhancement and fusion. The overall flowchart of our proposed tracker consists of a two-branch Siamese network, respectively, an exemplar branch, and a search branch. First, deep features of the RGB and thermal infrared images are extracted by a backbone. Then the features in each branch are enhanced progressively in the channel and spatial dimensions. Specifically, in the channel dimension, the channel attention feature module (CAFM) is designed to adaptively enhance the RGB and thermal infrared features. In the spatial dimension, the transformer self-attention mechanism with the AiA module is integrated to enhance the dual-modality features. Next, the enhanced features from the exemplar and search branches are fused based on the transformer cross-attention mechanism, which can achieve global and deep interaction between the exemplar and search images. Finally, the fused features are fed into a corner predictor head to estimate the target state. Experiments on two widely used public benchmarks (RGBT234 and LasHeR) demonstrate the effectiveness and efficiency of our proposed method when compared to many other state-of-the-art (SOTA) trackers released recently.
What problem does this paper attempt to address?