Abstract:An essential need for accurate visual object tracking is to capture better correlations between the tracking target and the search region. However, the dominant Siamese-based trackers are limited to producing dense similarity maps at once via a cross-correlations operation, ignoring to remedy the contamination caused by erroneous or ambiguous matches. In this paper, we propose a novel tracker, termed neighborhood consensus constraint-based siamese tracker (NCSiam), which takes the idea of neighborhood consensus constraint to refine the produced correlation maps. The intuition behind our approach is that we can support the nearby erroneous or ambiguous matches by analyzing a larger context of the scene that contains a unique match. Specifically, we devise a 4D convolution-based multi-level similarity refinement (MLSR) strategy. Taking the primary similarity maps obtained from a cross-correlation as input, MLSR acquires reliable matches by analyzing neighborhood consensus patterns in 4D space, thus enhancing the discriminability between the tracking target and the distractors. Besides, traditional Siamese-based trackers directly perform classification and regression on similarity response maps which discard appearance or semantic information. Therefore, an appearance affinity decoder (AAD) is developed to take full advantage of the semantic information of the search region. To further improve performance, we design a task-specific disentanglement (TSD) module to decouple the learned representations into classification-specific and regression-specific embeddings. Extensive experiments are conducted on six challenging benchmarks, including GOT-10k, TrackingNet, LaSOT, UAV123, OTB2015, and VOT2020. The results demonstrate the effectiveness of our method. The code will be available at https://github.com/laybebe/NCSiam.

SiamCPN: Visual tracking with the Siamese center-prediction network

Siamese Centerness Prediction Network for Real-Time Visual Object Tracking

Siamese Refine Polar Mask Prediction Network for Visual Tracking

PointSiamRCNN: Target-aware Voxel-based Siamese Tracker for Point Clouds

Siamese Attentional Cascade Keypoints Network for Visual Object Tracking

SiamBAN: Target-Aware Tracking With Siamese Box Adaptive Network

SiamCorners: Siamese Corner Networks for Visual Tracking

Siamese anchor-free object tracking with multiscale spatial attentions

SiamOAN: Siamese object-aware network for real-time target tracking

Visual Tracking With Siamese Network Based on Fast Attention Network

Multitarget Tracking Using Siamese Neural Networks

NCSiam: Reliable Matching Via Neighborhood Consensus for Siamese-Based Object Tracking.

Real-time object tracking in the wild with Siamese network

Object Tracking Algorithm Based on Channel-interconnection-spatial Attention Mechanism and Siamese Region Proposal Network

Siamese Tracking Network with Spatial-Semantic-Aware Attention and Flexible Spatiotemporal Constraint

SiamMan: Siamese Motion-aware Network for Visual Tracking

Faster and Simpler Siamese Network for Single Object Tracking

SiamCAM: A Real-Time Siamese Network for Object Tracking with Compensating Attention Mechanism

Hierarchical correlation siamese network for real-time object tracking

IoU-guided Siamese region proposal network for real-time visual tracking

Joint Classification and Regression for Visual Tracking with Fully Convolutional Siamese Networks