Distillation, Ensemble and Selection for building a Better and Faster Siamese based Tracker

Shaochuan Zhao,Tianyang Xu,Xiao-Jun Wu,Josef Kittler
DOI: https://doi.org/10.1109/tcsvt.2022.3177215
IF: 5.859
2022-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Visual object tracking has witnessed continuous improvements in performance, thanks to deep CNN learning that recently emerged. More complex CNN models invariably offer better accuracy. However, there is a conflict between the tracking efficiency and model complexity, which poses a challenge in balancing speed against accuracy. To optimize the trade-off between these two performance criteria, a distillation-ensemble-selection framework is proposed in this paper. Without any modification to the baseline network architecture, the proposed approach enables the construction of a Siamese-based tracker with improved capacity and efficiency. Specifically, multiple student trackers are designed by means of knowledge distillation from a given teacher tracking model. To manage the varying granularity of unknown targets, an ensemble module combines the outputs of the student trackers with the help of a learnable fine-grained attention module. Besides, in the online tracking stage, a selection module adaptively controls the complexity of the tracker by identifying an appropriate subset of the candidate tracker models. We verify the effectiveness of the proposed method in both anchor-based and anchor-free paradigms. The experimental results obtained on standard benchmarking datasets demonstrate the effectiveness of the proposed method, with an outstanding and balanced performance in both accuracy and speed.
engineering, electrical & electronic
What problem does this paper attempt to address?