Abstract:In the realm of object tracking, complex backgrounds pose significant challenges, often leading to the underperformance of existing trackers that predominantly rely on deep features from the final layer of feature extraction networks. These features, while semantically rich, are not always sufficient to distinguish targets from cluttered environments. To address this limitation, we introduce the Background-aware Siamese Network (BASNet), an innovative approach that enhances the salience of features in intricate scenes. Central to our method is the Dual-Feature Fusion (DFF) model, meticulously crafted to stabilize target representation amidst distracting backgrounds. BASNet leverages the complementary strengths of deep and shallow features, harnessing the former’s semantic depth and the latter’s precision in spatial localization. This fusion not only elevates feature utilization but also mitigates the shortcomings associated with the superficial output of shallow networks. Further refining the target’s spatial definition, our attention-focusing module (AF) plays a pivotal role. It accentuates pertinent features 1 and attenuates noise emanating from the multi-layer feature amalgamation. The module’s ingenuity lies in its cyclic attention operations, which enable each pixel to establish extensive dependencies with all others , thereby spotlighting salient features. For precise target localization, BASNet employs the SIoU loss within its regression branch, a strategic choice that contributes to the model’s accuracy. Extensive experiments demonstrate the competitive performance of our method on six datasets: OTB50, OTB100, UAV123, VOT2016, VOT2018, and VOT2019.

Learning Siamese Network with Top-Down Modulation for Visual Tracking

Background-aware Siamese Network Tracking Based on Salient Feature Fusion

Mutual Learning and Feature Fusion Siamese Networks for Visual Object Tracking

Transformer Union Convolution Network for Visual Object Tracking

Multiple Convolutional Features in Siamese Networks for Object Tracking

Multitarget Tracking Using Siamese Neural Networks

The Multi-task Fully Convolutional Siamese Network with Correlation Filter Layer for Real-Time Visual Tracking

CTT: CNN Meets Transformer for Tracking

Learning Temporal-Correlated and Channel- Decorrelated Siamese Networks for Visual Tracking

DASTSiam: Spatio‐temporal Fusion and Discriminative Enhancement for Siamese Visual Tracking

Adaptive Multi-Feature Fusion Visual Target Tracking Based on Siamese Neural Network with Cross-Attention Mechanism

A Learning Frequency-Aware Feature Siamese Network for Real-Time Visual Tracking

Learning Motion-Perceive Siamese network for robust visual object tracking

SiamDCFF: Dynamic Cascade Feature Fusion for Vision Tracking

Staged Depthwise Correlation and Feature Fusion for Siamese Object Tracking

Siamese Tracking Network with Multi-attention Mechanism

Siamese Network Tracking Based on Feature Enhancement

MFST: Multi-Features Siamese Tracker

DCF-ASN: Coarse-to-fine Real-time Visual Tracking via Discriminative Correlation Filter and Attentional Siamese Network

Object Tracking Algorithm Based on Channel-interconnection-spatial Attention Mechanism and Siamese Region Proposal Network

Learning Deep Lucas-Kanade Siamese Network for Visual Tracking