Abstract:Siamese networks have found applications in various fields, notably object tracking, due to their remarkable speed and accuracy. Siamese tracking networks rely on cross-correlation to obtain the similarity score between the target template and the search region. However, since cross-correlation is a local matching operation, it cannot effectively capture the global context information. While the Transformer for feature fusion can better capture long-range dependencies and obtain more semantic information, more localized edge information is needed to distinguish the target from the background. Cross-correlation fusion and Transformer fusion have their advantages. They can complement each other, so we combine them and propose a dual feature fusion tracker (SiamCT) to obtain the local correlations and global dependencies between the target and the search region. Specifically, we construct two parallel feature fusion paths based on cross-correlation and Transformer. Among them, for cross-correlation fusion, we adopt the more efficient two-dimension pixel-wise cross-correlation (TDPC), which performs correlation operations from both spatial and channel dimensions, and the interaction of multidimensional information helps to realize more accurate feature fusion. Subsequently, the fused features are augmented by coordinate attention (CA) for orientation-dependent positional information. For Transformer fusion, we introduce cos-based linear attention(ClA) to improve Transformer’s ability to acquire global context information. Our SiamCT outperforms existing leading methods in GOT-10k, LaSOT, TrackingNet, and OTB100 benchmarks based on extensive experiments. In particular, the AO score on the GOT-10k benchmark is 70.6%, and the ${SR_{0.5}}$ and ${SR_{0.75}}$ scores are 80.5%, 65.9%, respectively, achieving state-of-the-art performance.

SiamUT: Siamese Unsymmetrical Transformer-like Tracking

CTT: CNN Meets Transformer for Tracking

DASTSiam: Spatio‐temporal Fusion and Discriminative Enhancement for Siamese Visual Tracking

SiamSGA: Siamese Symmetric Graph Attention Tracking

SiamDA: Dual Attention Siamese Network for Real-Time Visual Tracking

Siamese Network with Transformer and Saliency Encoder for Object Tracking

Unveil the Potential of Siamese Framework for Visual Tracking

Dual Feature Fusion Tracking with Combined Cross-Correlation and Transformer

Siamese transformer RGBT tracking

Siamese Tracking Network with Spatial-Semantic-Aware Attention and Flexible Spatiotemporal Constraint

Siamese Tracking Network with Multi-attention Mechanism

IMSiam: IoU-aware Matching-adaptive Siamese Network for Object Tracking

Deformable Siamese Attention Networks for Visual Object Tracking

Anchor-free Siamese Network Based on Visual Tracking

Online Visual Tracking Via Cross-Similarity-based Siamese Network.

SiamST: Siamese Network with Spatio-Temporal Awareness for Object Tracking

DASTSiam: Spatio-Temporal Fusion and Discriminative Augmentation for Improved Siamese Tracking

SiamHAS: Siamese Tracker with Hierarchical Attention Strategy for Aerial Tracking

Siamese Visual Tracking with Dual-Pipeline Correlated Fusion Network

Evolution of Siamese Visual Tracking with Slot Attention

A Twofold Siamese Network for Real-Time Object Tracking.