Unsupervised RGB-T object tracking with attentional multi-modal feature fusion

Shenglan Li,Rui Yao,Yong Zhou,Hancheng Zhu,Bing Liu,Jiaqi Zhao,Zhiwen Shao
DOI: https://doi.org/10.1007/s11042-023-14362-9
IF: 2.577
2023-02-03
Multimedia Tools and Applications
Abstract:RGB-T tracking means that given the object position in the first frame, the tracker is trained to predict the position of the object in consecutive frames by taking full advantage of the complementary information of RGB and thermal infrared images. As the amount of data increases, unsupervised training has great potential for development in RGB-T tracking task. As we all know, features extracted from different convolutional layers can provide different levels information in the image. In this paper, we propose a framework for visual tracking based on the attention mechanism fusion of multi-modal and multi-level features. This fusion method can give full play to the advantages of multi-level and multi-modal information. Specificly, we use a feature fusion module to fuse these features from different levels and different modalities at the same time. We use cycle consistency based on a correlation filter to implement unsupervised training of the model to reduce the cost of annotated data. The proposed tracker is evaluated on two popular benchmark datasets, GTOT and RGB-T234. Experimental results show that our tracker performs favorably against other state-of-the-art unsupervised trackers with a real-time tracking speed.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?