Improving Accuracy and Generalization for Efficient Visual Tracking

Ram Zaveri,Shivang Patel,Yu Gu,Gianfranco Doretto
2024-11-28
Abstract:Efficient visual trackers overfit to their training distributions and lack generalization abilities, resulting in them performing well on their respective in-distribution (ID) test sets and not as well on out-of-distribution (OOD) sequences, imposing limitations to their deployment in-the-wild under constrained resources. We introduce SiamABC, a highly efficient Siamese tracker that significantly improves tracking performance, even on OOD sequences. SiamABC takes advantage of new architectural designs in the way it bridges the dynamic variability of the target, and of new losses for training. Also, it directly addresses OOD tracking generalization by including a fast backward-free dynamic test-time adaptation method that continuously adapts the model according to the dynamic visual changes of the target. Our extensive experiments suggest that SiamABC shows remarkable performance gains in OOD sets while maintaining accurate performance on the ID benchmarks. SiamABC outperforms MixFormerV2-S by 7.6\% on the OOD AVisT benchmark while being 3x faster (100 FPS) on a CPU.
Computer Vision and Pattern Recognition,Machine Learning,Multimedia
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: currently, efficient visual trackers perform well on in - distribution (ID) datasets, but lack generalization ability on out - of - distribution (OOD) datasets. This leads to limitations when these trackers are deployed in practical applications, especially in resource - constrained situations. Specifically, although existing efficient trackers are fast, they cannot handle extreme visual conditions in the wild well, such as low - light, occlusion, etc. To address this problem, the authors propose SiamABC, an efficient Siamese tracker, aiming to significantly improve tracking performance, especially on OOD sequences. SiamABC is improved in the following aspects: 1. **Architecture Design**: - A dual - template and a dual - search - region are introduced to better capture the dynamic changes of the target. - A new Fast Mixed Filtration (FMF) layer is designed to enhance the relevant feature representation. 2. **Loss Function**: - A new Transitive Relation Loss (TRL) is proposed to help bridge the spatio - temporal visual similarities between the dual - template and the dual - search - region. - A Regularization Loss (LReg) is introduced to ensure that the model does not ignore the information of the static template and the search region. 3. **Test - Time Adaptation**: - A fast dynamic test - time adaptation method without back - propagation (Dynamic Test - Time Adaptation, DTTA) is adopted, enabling the model to be continuously adjusted according to the dynamic visual changes of the target. Through these improvements, SiamABC not only remains efficient in speed (for example, reaching 100 FPS on the CPU), but also has a significant improvement in performance on OOD datasets. Experimental results show that SiamABC outperforms existing methods on multiple benchmark datasets. In particular, on the A VisT benchmark, its AUC score is 7.6% higher than that of MixFormerV2 - S, and it is nearly 3 times faster. ### Formula Summary - The **Transitive Relation Loss (TRL)** and the **Regularization Loss (LReg)** are defined as follows: \[ L_{TR} = D(\Omega(F_D, F_T), \Omega(F_t, F_S)) \] \[ L_{Reg} = D(\Omega(F_D, F_T), F_t) \] where, \[ D(x_1, x_2) = \frac{1}{2} \left( D(h_1(x_1), h_2(x_2)) + D(h_1(x_2), h_2(x_1)) \right) \] \[ D(z_1, z_2) = 1 - \frac{z_1}{||z_1||_2} \cdot \frac{z_2}{||z_2||_2} \] - **Total Tracking Loss**: \[ L = \lambda_{IoU} L_{IoU} + \lambda_{FL} L_{FL} + \lambda_{TR} L_{TR} + \lambda_{Reg} L_{Reg} \] - **BN Statistics Update**: \[ \mu_{I,t} = (1 - \lambda_{BN}) \mu + \lambda_{BN} \mu_{I,t} \] \[ \sigma^2_{I,t} = (1 - \lambda_{BN}) \sigma^2 + \lambda_{BN} \sigma^2_{I,t} \] These formulas and methods work together to enable SiamABC to significantly improve the tracking performance on OOD datasets while remaining efficient.