Attention-based Gating Network for Robust Segmentation Tracking

Yijin Yang,Xiaodong Gu
DOI: https://doi.org/10.1109/tcsvt.2024.3460400
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Visual object tracking is a challenging task that aims to accurately estimate the scale and position of a designated target. Recently, segmentation networks have proven effective in visual tracking, producing outstanding results for target scale estimation. However, segmentation-based trackers still lack robustness due to the presence of similar distractors. To mitigate this issue, we propose an Attention-based Gating Network (AGNet) that produces gating weights to diminish the impact of feature maps linked to similar distractors. Subsequently, we incorporate the AGNet into the segmentation-based tracking paradigm to achieve accurate and robust tracking. Specifically, the AGNet utilizes three cascading Multi-Head Cross-Attention (MHCA) modules to generate gating weights that govern the generation of feature maps in the baseline tracker. The proficiency of the MHCA in modeling global semantic information effectively suppresses feature maps associated with similar distractors. Additionally, we introduce a distractor-aware training strategy that leverages distractor masks to train our model. To alleviate the issue of partial occlusion, we introduce a box refinement module to enhance the accuracy of the predicted target box. Comprehensive experiments conducted on 11 challenging tracking benchmarks show that our approach significantly surpasses the baseline tracker across all metrics and achieves excellent results on multiple tracking benchmarks.
What problem does this paper attempt to address?