Abstract:Visual object tracking algorithms based on Siamese networks yield promising results through offline training on large benchmarks. However, they cannot adapt well to changes in the target’s appearance and tracking scenarios during online tracking because they rely on a single initial template. Most existing template update algorithms use the tracking result of the previous frame to update the template. If the tracking results deteriorate, it can cause the tracker to accumulate incorrect templates, leading to tracking drift. In this work, we proposed a dynamic template updating Siamese network called dynamic template updating (DTU)-Track that utilizes status feedback with quality evaluation for visual object tracking. This network dynamically selects the tracking template for the next frame based on feedback from the tracking result of the previous frame. It calculates the updating quality score and tracking quality score by using the tracking result of the previous frame. The updating quality score determines whether the current tracking result can be stored in the template library. The tracking quality score quantifies the quality of the tracking status of the previous frame. The quality conversion module determines the number of templates required for the next frame by evaluating the tracking quality of the previous frame. The template extraction mechanism selects high-quality, diverse templates from the template library, thereby enabling the tracker to effectively adapt to changes in tracking scenarios and target’s appearance in the subsequent frame. Extensive experiments on large-scale benchmarks, such as OTB-2015, UAV123, GOT-10k, TrackingNet, and LaSOT, demonstrated our method’s superiority over other algorithms in various tracking scenarios, yielding a real-time speed of 36 FPS.

PromptVT: Prompting for Efficient and Accurate Visual Tracking

Explicit Visual Prompts for Visual Object Tracking

Improving Visual Object Tracking through Visual Prompting

Visual Prompt Multi-Modal Tracking

PVT++: A Simple End-to-End Latency-Aware Visual Tracking Framework

LiteTrack: Layer Pruning with Asynchronous Feature Extraction for Lightweight and Efficient Visual Tracking

Self-Prompting Tracking: A Fast and Efficient Tracking Pipeline for UAV Videos

Mobile Vision Transformer-based Visual Object Tracking

Middle Fusion and Multi-Stage, Multi-Form Prompts for Robust RGB-T Tracking

HIPTrack: Visual Tracking with Historical Prompts

AViTMP: A Tracking-Specific Transformer for Single-Branch Visual Tracking

STMTrack: Template-free Visual Tracking with Space-time Memory Networks

Exploring Dynamic Transformer for Efficient Object Tracking

Efficient Visual Tracking via Hierarchical Cross-Attention Transformer

OT-VP: Optimal Transport-guided Visual Prompting for Test-Time Adaptation

Target-aware transformer tracking with hard occlusion instance generation

Distractor-Aware Event-Based Tracking

BACTrack: Building Appearance Collection for Aerial Tracking

Parallel Tracking and Verifying

Dynamic template updating Siamese network based on status feedback with quality evaluation for visual object tracking

Adaptively Bypassing Vision Transformer Blocks for Efficient Visual Tracking