Exploring fusion strategies for accurate RGBT visual object tracking
Zhangyong Tang,Tianyang Xu,Hui Li,Xiao-Jun Wu,XueFeng Zhu,Josef Kittler
DOI: https://doi.org/10.1016/j.inffus.2023.101881
IF: 18.6
2023-06-18
Information Fusion
Abstract:We address the problem of multi-modal object tracking in video and explore various options available for fusing the complementary information conveyed by the visible (RGB) and thermal infrared (TIR) modalities, including pixel-level, feature-level and decision-level fusion. Specifically, in contrast to the existing approaches, we propose and develop the paradigm for combining multi-modal information for image fusion at pixel level. At the feature level, two different kinds of fusion strategies are investigated for completeness, i.e. , the attention-based online fusion strategy and the offline-trained fusion block. At the decision level, a novel fusion strategy is put forward, inspired by the success of the simple averaging configuration which has shown so much promise. The effectiveness of the proposed decision-level fusion strategy owes to a number of innovative contributions, including a dynamic weighting of the RGB and TIR contributions and a linear template update operation. A variant of the proposed decision fusion method produced the winning tracker at the Visual Object Tracking Challenge 2020 (VOT-RGBT2020). A comprehensive comparison of the innovative pixel and feature-level fusion strategies with the proposed decision-level fusion method highlights the advantages fusing multimodal information at the decision score level. Extensive experimental results on five challenging datasets, i.e. , GTOT, VOT-RGBT2019, RGBT234, LasHeR and VOT-RGBT2020, demonstrate the effectiveness and robustness of the proposed method, compared to the state-of-the-art approaches. The Code is available at https://github.com/Zhangyong-Tang/DFAT .
computer science, artificial intelligence, theory & methods