Object fusion tracking for RGB-T images via channel swapping and modal mutual attention
Tian Luan,Hui Zhang,Jiafeng Li,Jing Zhang,Li Zhuo
DOI: https://doi.org/10.1109/jsen.2023.3305501
IF: 4.3
2023-01-01
IEEE Sensors Journal
Abstract:RGB-thermal (RGB-T) dual-modal imaging significantly broadens the observation dimensions of the vision system. However, effectively harnessing the inherent advantages of different spectral bands and establishing fusion solutions tightly coupled with end tasks remains highly challenging. This article proposes a modality fusion approach that combines channel switching and cross-modal attention for RGB-T tracking. We explore the hierarchical fusion method adapted to the deep features of different abstraction levels. For low-level features, cross-modal information is introduced to increase the diversity of unimodal data by swapping feature channels with low computational costs. To exploit the semantic representation of high-level deep features and heterogeneous information in multimodal data, a fusion structure based on modal mutual attention is designed, which achieves effective enhancement of RGB-T fusion feature representation by integrating modal self-attention and cross-modal attention. Experimental results on public datasets show that the proposed algorithm is effective and computationally efficient to obtain the state-of-the-art tracking performance and real-time processing.
engineering, electrical & electronic,instruments & instrumentation,physics, applied
What problem does this paper attempt to address?