Abstract:Visual tracking has made significant strides due to the adoption of transformer-based models. Most state-of-the-art trackers struggle to meet real-time processing demands on mobile platforms with constrained computing resources, particularly for real-time unmanned aerial vehicle (UAV) tracking. To achieve a better balance between performance and efficiency, we introduce AVTrack, an adaptive computation framework designed to selectively activate transformer blocks for real-time UAV tracking. The proposed Activation Module (AM) dynamically optimizes the ViT architecture by selectively engaging relevant components, thereby enhancing inference efficiency without significant compromise to tracking performance. Furthermore, to tackle the challenges posed by extreme changes in viewing angles often encountered in UAV tracking, the proposed method enhances ViTs' effectiveness by learning view-invariant representations through mutual information (MI) maximization. Two effective design principles are proposed in the AVTrack. Building on it, we propose an improved tracker, dubbed AVTrack-MD, which introduces the novel MI maximization-based multi-teacher knowledge distillation (MD) framework. It harnesses the benefits of multiple teachers, specifically the off-the-shelf tracking models from the AVTrack, by integrating and refining their outputs, thereby guiding the learning process of the compact student network. Specifically, we maximize the MI between the softened feature representations from the multi-teacher models and the student model, leading to improved generalization and performance of the student model, particularly in noisy conditions. Extensive experiments on multiple UAV tracking benchmarks demonstrate that AVTrack-MD not only achieves performance comparable to the AVTrack baseline but also reduces model complexity, resulting in a significant 17\% increase in average tracking speed.

Learning Adaptive and View-Invariant Vision Transformer with Multi-Teacher Knowledge Distillation for Real-Time UAV Tracking

Adaptive and Background-Aware Vision Transformer for Real-Time UAV Tracking

Learning Motion Blur Robust Vision Transformers with Dynamic Early Exit for Real-Time UAV Tracking

AViTMP: A Tracking-Specific Transformer for Single-Branch Visual Tracking

SGDViT: Saliency-Guided Dynamic Vision Transformer for UAV Tracking

A Unified Transformer-based Tracker for Anti-UAV Tracking

A Vision-based UAV Tracker Aiming at Aerial Targets

Learning Efficient Transformer Representation for Siamese Tracker to UAV.

Transformer-based Moving Target Tracking Method for Unmanned Aerial Vehicle

Exploiting Temporal Coherence for Self-Supervised Visual Tracking by Using Vision Transformer

Siamese Transformer Network: Building an Autonomous Real-Time Target Tracking System for UAV

TrackingMamba: Visual State Space Model for Object Tracking

Multi-Source Templates Learning for Real-Time Aerial Tracking

Real-time Adaptive Multi-Classifier Multi-Resolution Visual Tracking Framework for Unmanned Aerial Vehicles

Enhancing Online UAV Multi-Object Tracking with Temporal Context and Spatial Topological Relationships

Cross-Parallel Attention and Efficient Match Transformer for Aerial Tracking

TransTracking for UAV: An Autonomous Real-time Target Tracking System for UAV via Transformer Tracking

Boosting UAV Tracking with Voxel-Based Trajectory-Aware Pre-Training.

Multi-step Temporal Modeling for UAV Tracking

Meta Transfer Learning For Adaptive Vehicle Tracking In Uav Videos

AutoTrack: Towards High-Performance Visual Tracking for UAV With Automatic Spatio-Temporal Regularization