Abstract:Advanced Siamese visual object tracking architectures are jointly trained using pair-wise input images to perform target classification and bounding box regression. They have achieved promising results in recent benchmarks and competitions. However, the existing methods suffer from two limitations: First, though the Siamese structure can estimate the target state in an instance frame, provided the target appearance does not deviate too much from the template, the detection of the target in an image cannot be guaranteed in the presence of severe appearance variations. Second, despite the classification and regression tasks sharing the same output from the backbone network, their specific modules and loss functions are invariably designed independently, without promoting any interaction. Yet, in a general tracking task, the centre classification and bounding box regression tasks are collaboratively working to estimate the final target location. To address the above issues, it is essential to perform target-agnostic detection so as to promote cross-task interactions in a Siamese-based tracking framework. In this work, we endow a novel network with a target-agnostic object detection module to complement the direct target inference, and to avoid or minimise the misalignment of the key cues of potential template-instance matches. To unify the multi-task learning formulation, we develop a cross-task interaction module to ensure consistent supervision of the classification and regression branches, improving the synergy of different branches. To eliminate potential inconsistencies that may arise within a multi-task architecture, we assign adaptive labels, rather than fixed hard labels, to supervise the network training more effectively. The experimental results obtained on several benchmarks, i.e., OTB100, UAV123, VOT2018, VOT2019, and LaSOT, demonstrate the effectiveness of the advanced target detection module, as well as the cross-task interaction, exhibiting superior tra- king performance as compared with the state-of-the-art tracking methods.

Joint Classification and Regression for Visual Tracking with Fully Convolutional Siamese Networks

SiamRCR: Reciprocal Classification and Regression for Visual Object Tracking

Multitarget Tracking Using Siamese Neural Networks

The Multi-task Fully Convolutional Siamese Network with Correlation Filter Layer for Real-Time Visual Tracking

Siamese Centerness Prediction Network for Real-Time Visual Object Tracking

Toward Robust Visual Object Tracking With Independent Target-Agnostic Detection and Effective Siamese Cross-Task Interaction

SiamMan: Siamese Motion-aware Network for Visual Tracking

Siamese Tracking Network with Spatial-Semantic-Aware Attention and Flexible Spatiotemporal Constraint

Learning Motion-Perceive Siamese network for robust visual object tracking

SiamBAN: Target-Aware Tracking With Siamese Box Adaptive Network

Discriminative and Robust Online Learning for Siamese Visual Tracking

Siamese Residual Network for Efficient Visual Tracking

SiamCPN: Visual tracking with the Siamese center-prediction network

Local to Global Tracker: A Siamese Network for Long-term Tracking

Siamese Attentional Cascade Keypoints Network for Visual Object Tracking

Object Tracking Algorithm Based on Channel-interconnection-spatial Attention Mechanism and Siamese Region Proposal Network

Mutual Learning and Feature Fusion Siamese Networks for Visual Object Tracking

SiamMFC: Visual Object Tracking Based on Mainfold Full Convolution Siamese Network

Learning Temporal-Correlated and Channel- Decorrelated Siamese Networks for Visual Tracking

Real-time object tracking in the wild with Siamese network

Evolution of Siamese Visual Tracking with Slot Attention