Abstract:Efficient visual trackers overfit to their training distributions and lack generalization abilities, resulting in them performing well on their respective in-distribution (ID) test sets and not as well on out-of-distribution (OOD) sequences, imposing limitations to their deployment in-the-wild under constrained resources. We introduce SiamABC, a highly efficient Siamese tracker that significantly improves tracking performance, even on OOD sequences. SiamABC takes advantage of new architectural designs in the way it bridges the dynamic variability of the target, and of new losses for training. Also, it directly addresses OOD tracking generalization by including a fast backward-free dynamic test-time adaptation method that continuously adapts the model according to the dynamic visual changes of the target. Our extensive experiments suggest that SiamABC shows remarkable performance gains in OOD sets while maintaining accurate performance on the ID benchmarks. SiamABC outperforms MixFormerV2-S by 7.6\% on the OOD AVisT benchmark while being 3x faster (100 FPS) on a CPU.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: currently, efficient visual trackers perform well on in - distribution (ID) datasets, but lack generalization ability on out - of - distribution (OOD) datasets. This leads to limitations when these trackers are deployed in practical applications, especially in resource - constrained situations. Specifically, although existing efficient trackers are fast, they cannot handle extreme visual conditions in the wild well, such as low - light, occlusion, etc. To address this problem, the authors propose SiamABC, an efficient Siamese tracker, aiming to significantly improve tracking performance, especially on OOD sequences. SiamABC is improved in the following aspects: 1. **Architecture Design**: - A dual - template and a dual - search - region are introduced to better capture the dynamic changes of the target. - A new Fast Mixed Filtration (FMF) layer is designed to enhance the relevant feature representation. 2. **Loss Function**: - A new Transitive Relation Loss (TRL) is proposed to help bridge the spatio - temporal visual similarities between the dual - template and the dual - search - region. - A Regularization Loss (LReg) is introduced to ensure that the model does not ignore the information of the static template and the search region. 3. **Test - Time Adaptation**: - A fast dynamic test - time adaptation method without back - propagation (Dynamic Test - Time Adaptation, DTTA) is adopted, enabling the model to be continuously adjusted according to the dynamic visual changes of the target. Through these improvements, SiamABC not only remains efficient in speed (for example, reaching 100 FPS on the CPU), but also has a significant improvement in performance on OOD datasets. Experimental results show that SiamABC outperforms existing methods on multiple benchmark datasets. In particular, on the A VisT benchmark, its AUC score is 7.6% higher than that of MixFormerV2 - S, and it is nearly 3 times faster. ### Formula Summary - The **Transitive Relation Loss (TRL)** and the **Regularization Loss (LReg)** are defined as follows: \[ L_{TR} = D(\Omega(F_D, F_T), \Omega(F_t, F_S)) \] \[ L_{Reg} = D(\Omega(F_D, F_T), F_t) \] where, \[ D(x_1, x_2) = \frac{1}{2} \left( D(h_1(x_1), h_2(x_2)) + D(h_1(x_2), h_2(x_1)) \right) \] \[ D(z_1, z_2) = 1 - \frac{z_1}{||z_1||_2} \cdot \frac{z_2}{||z_2||_2} \] - **Total Tracking Loss**: \[ L = \lambda_{IoU} L_{IoU} + \lambda_{FL} L_{FL} + \lambda_{TR} L_{TR} + \lambda_{Reg} L_{Reg} \] - **BN Statistics Update**: \[ \mu_{I,t} = (1 - \lambda_{BN}) \mu + \lambda_{BN} \mu_{I,t} \] \[ \sigma^2_{I,t} = (1 - \lambda_{BN}) \sigma^2 + \lambda_{BN} \sigma^2_{I,t} \] These formulas and methods work together to enable SiamABC to significantly improve the tracking performance on OOD datasets while remaining efficient.

Improving Accuracy and Generalization for Efficient Visual Tracking

SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines

Siamese Residual Network for Efficient Visual Tracking

Adaptive Siamese Tracking with a Compact Latent Network

Distillation, Ensemble and Selection for building a Better and Faster Siamese based Tracker

SiamRDT: An Object Tracking Algorithm Based on a Reliable Dynamic Template

Adaptive distractor-aware for siamese tracking via enhancement confidence evaluator

SiamRCR: Reciprocal Classification and Regression for Visual Object Tracking

Distractor-aware Siamese Networks for Visual Object Tracking

SiamAUDT: adaptive updating decision for online Siamese tracker

SiamBAN: Target-Aware Tracking With Siamese Box Adaptive Network

Discriminative and Robust Online Learning for Siamese Visual Tracking

Visual Tracking With Siamese Network Based on Fast Attention Network

Siamese Instance Search for Tracking

Deformable Siamese Attention Networks for Visual Object Tracking

SiamFDA: feature dynamic activation siamese network for visual tracking

Real-time object tracking in the wild with Siamese network

Learning Localization-aware Target Confidence for Siamese Visual Tracking

Toward Robust Visual Object Tracking With Independent Target-Agnostic Detection and Effective Siamese Cross-Task Interaction

Antidecay LSTM for Siamese Tracking With Adversarial Learning

Learning Motion-Perceive Siamese network for robust visual object tracking