SiamS3C: spatial-channel cross-correlation for visual tracking with centerness-guided regression

Sangaiah, Arun Kumar
DOI: https://doi.org/10.1007/s00530-024-01450-5
IF: 3.9
2024-08-22
Multimedia Systems
Abstract:Visual object tracking can be divided into the object classification and bounding-box regression tasks, but only one sharing correlation map leads to inaccuracy. Siamese trackers compute correlation map by cross-correlation operation with high computational cost, and this operation performed either on channels or in spatial domain results in weak perception of the global information. In addition, some Siamese trackers with a centerness branch ignore the associations between the centerness branch and the bounding-box regression branch. To alleviate these problems, we propose a visual object tracker based on Spatial-Channel Cross-Correlation and Centerness-Guided Regression. Firstly, we propose a spatial-channel cross-correlation module (SC3M) that combines the search region feature and the template feature both on channels and in spatial domain, which suppresses the interference of distractors. As a lightweight module, SC3M can compute dual independent correlation maps inputted to different subnetworks. Secondly, we propose a centerness-guided regression subnetwork consisting of the centerness branch and the bounding-box regression branch. The centerness guides the whole regression subnetwork to enhance the association of two branches and further suppress the low-quality predicted bounding boxes. Thirdly, we have conducted extensive experiments on five challenging benchmarks, including GOT-10k, VOT2018, TrackingNet, OTB100 and UAV123. The results show the excellent performance of our tracker and our tracker achieves real-time requirement at 48.52 fps.
computer science, information systems, theory & methods
What problem does this paper attempt to address?