RGBT Tracking by Fully-Convolutional Triple Networks with Cosine Embedding Loss

Ping Zhang,Jin Luo,Muyang Li,Chunming Gao,Changke Wu
DOI: https://doi.org/10.1145/3512353.3512367
2022-01-01
Abstract:RGBT tracking has drawn much attention on computer vision in recent years, which aims to fuse complementary information from visible and thermal images for robust object tracking. There are a lot of works paying extensive explorations for fusing features from convolutional networks to integrate modalities, especially modality-specific properties. Although these methods have achieved quite good performance, it is insufficient to represent and interpret the commonness and specificity, as well as the relationship between them, which are significant for RGBT tracking. In this work, we propose a novel triple network to extract Modal-Common, Modal-Specific features, which interprets Modal Common-Specific information from multi-modal images. Besides, a corresponding cosine embedding loss is designed to differentiate features and make them discriminative. For the purpose of perceiving complementary information of modal-specific features, we propose a cross-modal attention-query module, which queries channel attention of another modality and enhances relevant channels itself. Moreover, we conduct an efficient tracker with fully-convolutional siamese network for real-time RGBT tracking. Extensive experiments on two RGBT benchmark datasets has proved the excellent performance and efficiency of our method compared with classic RGB trackers and other state-of-the-art RGBT tracking algorithms.
What problem does this paper attempt to address?