RGBT Image Fusion Tracking via Sparse Trifurcate Transformer Aggregation Network

Mingzheng Feng,Jianbo Su
DOI: https://doi.org/10.1109/tim.2024.3365162
IF: 5.6
2024-02-27
IEEE Transactions on Instrumentation and Measurement
Abstract:Recently have testified the superior tracking ability of Transformer in RGBT tracking for its global and dynamic modeling property. However, these Transformer-based trackers lack attention to the primary feature information and are susceptible to interference from background information. In addition, they often either focus on shared modality information or specific modality information but fail to adequately explore the potential of these two patterns together. To address these issues, a sparse trifurcate Transformer aggregation network is proposed in this article for enhancing tracking robustness. First, a trifurcate tree structure is designed to obtain both modality-shared and modality-specific information, which can learn more powerful feature representations. Second, a sparse attention mechanism is adopted in Transformer to focus on the important features. To fully mine the complementary multimodal information, a confidence-aware aggregation network is designed to generate reliability weights of each mode. Finally, a double-head network is introduced to locate target. Sufficient experimental results on multiple RGBT benchmarks, including GTOT, RGBT210, RGBT234, and LasHeR, verify superior tracking ability against other advanced trackers.
engineering, electrical & electronic,instruments & instrumentation
What problem does this paper attempt to address?