Accurate target estimation with image contents for visual tracking

Sheng Wang,Xi Chen,Jia Yan
DOI: https://doi.org/10.1007/s11042-024-18869-7
IF: 2.577
2024-04-07
Multimedia Tools and Applications
Abstract:Recently, Siamese-like trackers have performed very well. Most of the methods exploit classification scores and quality assessment scores to estimate a target. However, their classification scores have a low correlation with target locational estimation, and quality scores by a simple strategy benefit the correlation limitedly, which damages the tracking ability of a tracker. To alleviate this problem, we propose a simple Siamese target estimation with image contents (luminance, contrast, and structure) method for object tracking. Specifically, we first employ image contents involving the target to generate similarity scores by SSIM (Structure Similarity Index Measure) in the similarity branch, aiming to aid the classification branch in improving target estimation by considering the whole target context information in our model. Secondly, we give different weights of the classification branch and similarity branch during inference to ease the low correlation, which shows more flexibility for the target locational estimation. Our tracker achieves competitive performance on three challenging benchmarks like OTB100, GOT-10k, and TrackingNet over a real-time speed, proving the effectiveness of our method. Particularly, our tracker outperforms the leading baseline by over 6.0% in SR score on GOT-10k benchmark still running at 67 FPS.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?