Towards Modalities Correlation for RGB-T Tracking
Xiantao Hu,Bineng Zhong,Qihua Liang,Shengping Zhang,Ning Li,Xianxian Li
DOI: https://doi.org/10.1109/tcsvt.2024.3396289
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Recently, RGB-T tracking methods have made significant progress, demonstrating remarkable capabilities in addressing the complexities of tracking tasks within demanding environments. However, these methods overlook instability of modal validity in real-world scenarios. This limits the model’s ability to understand the correlation between modalities, thereby hindering the model’s ability to fully leverage the synergistic effects of RGB and TIR. To address this challenge, we propose a novel RGB-T tracking model named MCTrack, from the perspective of leveraging correlation among modalities. First, during the feature extraction stage, we design a novel module based on channel matching modeling to construct bidirectional channel context information flow for two modalities. By leveraging information flow, specific modalities correlation information can be transmitted to two modes, augmenting the correlation between the two modes adaptively. Subsequently, after the feature extraction network, the features of each modality are decoded and transformed to generate more correlated feature representations. During this stage, we extract distinctive and collective features by leveraging the correlation among modalities. Then fusing these features and generated search region features specifically for localization. This aids the model in comprehending the correlation between RGB and TIR under complex scenarios, thereby enhancing its ability to capture and utilize key features. Based on extensive experiments conducted on four popular RGB-T tracking benchmarks, our model demonstrates superior performance, particularly showcasing impressive results on the LasHeR dataset with an achieved Precision of 71.6%.