Learning Multi-Layer Attention Aggregation Siamese Network for Robust RGBT Tracking

Mingzheng Feng,Jianbo Su
DOI: https://doi.org/10.1109/tmm.2023.3310295
IF: 7.3
2024-01-01
IEEE Transactions on Multimedia
Abstract:Recent years have witnessed the popularity of integrating Siamese network into RGBT tracking for fast-tracking. However, these trackers mostly utilize the feature information of the last output layer and ignore the benefits of multi-layer information. In addition, they often adopt feature-level fusion for different modalities but fail to explore the strength of decision-level fusion, which may easily decrease their flexibility and independence. In this paper, a novel multi-layer attention aggregation Siamese network on the decision level is proposed for robust RGBT tracking. To be specific, a hierarchical channel attention Siamese network is built to recalibrate the extracted multi-layer features from RGB and thermal infrared images. This can focus on more discriminative features to learn robust feature representation. Then, a depth-wise correlation operation is performed to produce RGB and thermal response maps, respectively. To better exploit and utilize the complementary RGB and thermal information, a contribution-aware aggregation network is designed to adaptively aggregate them. Lastly, a classification and regression network is adopted to complete the bounding box prediction. Extensive experiments on four large-scale RGBT benchmarks demonstrate outstanding tracking ability over other state-of-the-art trackers.
What problem does this paper attempt to address?