LoReTrack: Efficient and Accurate Low-Resolution Transformer Tracking

Shaohua Dong,Yunhe Feng,Qing Yang,Yuewei Lin,Heng Fan
2024-05-28
Abstract:High-performance Transformer trackers have shown excellent results, yet they often bear a heavy computational load. Observing that a smaller input can immediately and conveniently reduce computations without changing the model, an easy solution is to adopt the low-resolution input for efficient Transformer tracking. Albeit faster, this hurts tracking accuracy much due to information loss in low resolution tracking. In this paper, we aim to mitigate such information loss to boost the performance of the low-resolution Transformer tracking via dual knowledge distillation from a frozen high-resolution (but not a larger) Transformer tracker. The core lies in two simple yet effective distillation modules, comprising query-key-value knowledge distillation (QKV-KD) and discrimination knowledge distillation (Disc-KD), across resolutions. The former, from the global view, allows the low-resolution tracker to inherit the features and interactions from the high-resolution tracker, while the later, from the target-aware view, enhances the target-background distinguishing capacity via imitating discriminative regions from its high-resolution counterpart. With the dual knowledge distillation, our Low-Resolution Transformer Tracker (LoReTrack) enjoys not only high efficiency owing to reduced computation but also enhanced accuracy by distilling knowledge from the high-resolution tracker. In extensive experiments, LoReTrack with a 256x256 resolution consistently improves baseline with the same resolution, and shows competitive or even better results compared to 384x384 high-resolution Transformer tracker, while running 52% faster and saving 56% MACs. Moreover, LoReTrack is resolution-scalable. With a 128x128 resolution, it runs 25 fps on a CPU with 64.9%/46.4% SUC scores on LaSOT/LaSOText, surpassing all other CPU real-time trackers. Code will be released.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### The Problem Addressed by the Paper This paper aims to address the issue of improving the tracking accuracy of low-resolution Transformer trackers while maintaining high efficiency. Specifically, although high-resolution Transformer trackers perform well, they usually have a heavy computational burden, which limits their deployment in practical applications. Reducing the input resolution can significantly decrease the computational load, thereby increasing tracking speed, but this leads to information loss, which in turn affects tracking accuracy. Therefore, this paper proposes a dual knowledge distillation framework to extract knowledge from a frozen high-resolution Transformer tracker to mitigate information loss in low-resolution tracking, thereby improving the accuracy of low-resolution Transformer trackers without sacrificing speed. ### Solution To achieve this goal, the authors propose a method called LoReTrack, which includes two key knowledge distillation modules: 1. **Query-Key-Value Knowledge Distillation (QKV-KD)**: - From a global perspective, it allows the low-resolution tracker to inherit features and interactions from the high-resolution tracker. - Knowledge distillation is performed through the queries, keys, and values in the multi-head self-attention mechanism, enabling the low-resolution tracker to learn finer-grained information from the high-resolution model. 2. **Discriminative Knowledge Distillation (Disc-KD)**: - From a target-aware perspective, it enhances the target-background discrimination ability of the low-resolution tracker. - By mimicking the discriminative regions generated by the high-resolution model, it improves the performance of the low-resolution tracker in complex scenarios. ### Experimental Results Through extensive experimental validation, LoReTrack performs excellently on multiple benchmark datasets, including LaSOT, LaSOT ext, GOT-10k, TrackingNet, and UAV123. Specifically: - At a resolution of 256², LoReTrack not only improves the success rate (SUC) by 1.6% compared to the baseline model at the same resolution (e.g., OSTrack-256) but also increases the running speed on GPU by 52% and saves 56% of multiply-accumulate operations (MACs). - At a resolution of 128², LoReTrack achieves a speed of 25 fps on CPU, with success rates of 64.9% and 46.4% on LaSOT and LaSOT ext, respectively, surpassing all other real-time CPU trackers. ### Conclusion The main contributions of this paper include: 1. Proposing an efficient and accurate low-resolution Transformer tracking method, LoReTrack, which effectively improves the performance of low-resolution tracking. 2. Introducing two novel knowledge distillation modules (QKV-KD and Disc-KD), enabling the low-resolution tracker to learn fine-grained and discriminative information from the high-resolution model. 3. Through extensive experimental validation, demonstrating that LoReTrack significantly improves tracking accuracy while maintaining high-speed operation, proving its effectiveness.