Robust Tracking via Unifying Pretrain-Finetuning and Visual Prompt Tuning

Guangtong Zhang,Qihua Liang,Ning Li,Zhiyi Mo,Bineng Zhong
DOI: https://doi.org/10.1145/3595916.3626410
2023-12-06
Abstract:The finetuning paradigm has been a widely used methodology for the supervised training of top-performing trackers. However, the finetuning paradigm faces one key issue: it is unclear how best to perform the finetuning method to adapt a pretrained model to tracking tasks while alleviating the catastrophic forgetting problem. To address this problem, we propose a novel partial finetuning paradigm for visual tracking via unifying pretrain-finetuning and visual prompt tuning (named UPVPT), which can not only efficiently learn knowledge from the tracking task but also reuse the prior knowledge learned by the pre-trained model for effectively handling various challenges in tracking task. Firstly, to maintain the pre-trained prior knowledge, we design a Prompt-style method to freeze some parameters of the pretrained network. Then, to learn knowledge from the tracking task, we update the parameters of the prompt and MLP layers. As a result, we cannot only retain useful prior knowledge of the pre-trained model by freezing the backbone network but also effectively learn target domain knowledge by updating the Prompt and MLP layer. Furthermore, the proposed UPVPT can easily be embedded into existing Transformer trackers (e.g., OSTracker and SwinTracker) by adding only a small number of model parameters (less than 1% of a Backbone network). Extensive experiments on five tracking benchmarks (i.e., UAV123, GOT-10k, LaSOT, TNL2K, and TrackingNet) demonstrate that the proposed UPVPT can improve the robustness and effectiveness of the model, especially in complex scenarios.
Computer Science
What problem does this paper attempt to address?