Lightweight Transformer Tracker: Compact and Effect Neural Network for Object Tracking with Long-Short Range Attention

Zheng Liu,C. Wan,Na Li,Xinyu Liu,Changpei Zha
DOI: https://doi.org/10.1145/3573942.3574038
2022-09-23
Abstract:Recent years, attention mechanism has been widely used in computer vision, such as object detection, tracking and recognition. Many studies reveal that attention has better performance than those CNN-based or RNN-based networks. However, the cost of better performance, such as complicated structure and complex algorithm, cannot be neglected. In this paper, a lightweight transformer tracker named LTT is proposed. Different from Transformer Tracking network (TransT), three aspects of lightweight operation are adopted: firstly, YOLO-nano Darknet is utilized as the feature extraction network; secondly, the size of template image is scaled to 1/4 of the original and removing the self-attention layer; finally, the combination layer of convolution and cross-attention (long-short range attention) is adopted for the sake of feature fusion. Experiments show that our tracker runs at roughly 70 fps on GPU while there is no significant performance loss compared with other networks like SiamFc or SiamRPN++. Moreover, the model size of our tracker is only 4.58M.
Engineering,Computer Science
What problem does this paper attempt to address?