Abstract:High computational power and significant time are usually needed to train a deep learning based tracker on large datasets. Depending on many factors, training might not always be an option. In this paper, we propose a framework with two ideas on Siamese-based trackers. (i) Extending number of templates in a way that removes the need to retrain the network and (ii) a lightweight temporal network with a novel architecture focusing on both local and global information that can be used independently from trackers. Most Siamese-based trackers only rely on the first frame as the ground truth for objects and struggle when the target's appearance changes significantly in subsequent frames in presence of similar distractors. Some trackers use multiple templates which mostly rely on constant thresholds to update, or they replace those templates that have low similarity scores only with more similar ones. Unlike previous works, we use adaptive thresholds that update the bag with similar templates as well as those templates which are slightly diverse. Adaptive thresholds also cause an overall improvement over constant ones. In addition, mixing feature maps obtained by each template in the last stage of networks removes the need to retrain trackers. Our proposed lightweight temporal network, CombiNet, learns the path history of different objects using only object coordinates and predicts target's potential location in the next frame. It is tracker independent and applying it on new trackers does not need further training. By implementing these ideas, trackers' performance improved on all datasets tested on, including LaSOT, LaSOT extension, TrackingNet, OTB100, OTB50, UAV123 and UAV20L. Experiments indicate the proposed framework works well with both convolutional and transformer-based trackers. The official python code for this paper will be publicly available upon publication.

TGAN: A Simple Model Update Strategy for Visual Tracking Via Template-Guidance Attention Network.

Siamese Graph Attention Networks for Robust Visual Object Tracking.

Generating Reliable Online Adaptive Templates for Visual Tracking

Channel Attention Based Generative Network for Robust Visual Tracking

Graph Attention Network for Context-Aware Visual Tracking

Graph Attention Tracking

Deformable Siamese Attention Networks for Visual Object Tracking

Siamese Tracking Network with Multi-attention Mechanism

Masked and Dynamic Siamese Network for Robust Visual Tracking.

SGAT: Shuffle and graph attention based Siamese networks for visual tracking

SiamATL: Online Update of Siamese Tracking Network via Attentional Transfer Learning

Dynamic template updating Siamese network based on status feedback with quality evaluation for visual object tracking

SiamMGT: robust RGBT tracking via graph attention and reliable modality weight learning

GradNet: Gradient-Guided Network for Visual Object Tracking

Improving Siamese Based Trackers with Light or No Training through Multiple Templates and Temporal Network

Global-local feature-mixed network with template update for visual tracking

IoU-guided Siamese network with high-confidence template fusion for visual tracking

Adaptive distractor-aware for siamese tracking via enhancement confidence evaluator

SiamDMU: Siamese Dual Mask Update Network for Visual Object Tracking

TrTr: Visual Tracking with Transformer

Siamese Attentional Cascade Keypoints Network for Visual Object Tracking