Abstract:Conducting reliable feature interaction plays a critical role in the visual tracking community, especially in recent dominated Siamese-based tracking paradigm. In general, there are two primary approaches for fusing representations from template and search area in the Siamese setting, i.e. , cross-correlation and transformer modeling. The former provides a straightforward interaction solution, which may have limitations in handling complex scenarios, such as appearance variations and occlusion. While the latter offers an effective interaction mechanism, albeit with higher computation complexity and model cost. In contrast to traditional Siamese-based trackers which rely on two mentioned feature cross-correlation operators, this paper proposes a novel Correlation-Refine network to address the issue of lacking semantic information caused by local linear matching in correlation, from both spatial and channel perspectives. Correlation-Refine network (named CR) is solely built on top of fully convolutional layers, without employing intricate transformer mechanisms or complex methods to fuse features from multiple scales. Moreover, we present a concise yet effective convolutional tracking framework based on the correlation-refine network. CR network can increase the discriminative ability of semantic information in a coarse-to-fine manner: it gradually learns the semantic features of the target to be tracked and suppresses interference from similar objects by stacking multiple CR layers. Extensive experiments and comparisons with recent competitive trackers in challenging large-scale benchmarks demonstrate that, our tracker outperforms all previous convolutional trackers and has competitive results with transformer-based method. The code will be made available.

Residual Attention SiameseRPN for Visual Tracking

Siamese Refine Polar Mask Prediction Network for Visual Tracking

Background-aware Siamese Network Tracking Based on Salient Feature Fusion

Siamese Residual Network for Efficient Visual Tracking

Siamese Attentional Cascade Keypoints Network for Visual Object Tracking

Siamese Centerness Prediction Network for Real-Time Visual Object Tracking

An object tracking framework with recapture based on correlation filters and Siamese networks

The Multi-task Fully Convolutional Siamese Network with Correlation Filter Layer for Real-Time Visual Tracking

Siamese Tracking Network with Spatial-Semantic-Aware Attention and Flexible Spatiotemporal Constraint

End-to-end multitask Siamese network with residual hierarchical attention for real-time object tracking

Discriminative and Robust Online Learning for Siamese Visual Tracking

CRTrack: Learning Correlation-Refine network for visual object tracking

High Performance Visual Tracking with Siamese Region Proposal Network

Distractor-aware Siamese Networks for Visual Object Tracking

Deformable Siamese Attention Networks for Visual Object Tracking

Siamese Network Object Tracking Algorithm Combining Attention Mechanism and Correlation Filter Theory

Object Tracking Algorithm Based on Channel-interconnection-spatial Attention Mechanism and Siamese Region Proposal Network

Visual object tracking based on residual network and cascaded correlation filters

Target-Aware Deep Tracking

Siamese-Based Attention Learning Networks for Robust Visual Object Tracking

Visual Tracking With Siamese Network Based on Fast Attention Network