Global-local feature-mixed network with template update for visual tracking

Li Zhao,Chenxiang Fan,Min Li,Zhonglong Zheng,Xiaoqin Zhang
DOI: https://doi.org/10.1016/j.patrec.2024.11.034
IF: 4.757
2024-12-08
Pattern Recognition Letters
Abstract:Deep learning trackers have succeeded with a powerful local and global feature extraction capacity. However, both Siamese-based trackers with local convolution and Transformer-based trackers with global Transformer do not fully utilize frames. These trackers cannot obtain accurate tracking when they are faced with target appearance changes. This paper proposes a global-local features mixed tracker named GLT to complement the advantages of global and local frame features. GLT uses depth-wise convolution with dynamic weight to get local features and residual Transformer to get global features. Owing to global and local details, our method can perform accurate and robust tracking. Meanwhile, GLT has a template update strategy based on the key frame to face long-term tracking challenge. Numerous experiments show that our GLT achieves excellent performance on short-term and long-term benchmarks, including GOT-10k, TrackingNet and LaSOT. Furthermore, without many attention operations like other Transformer-based trackers, our GLT has fewer parameters and runs in real-time.
computer science, artificial intelligence
What problem does this paper attempt to address?