Robust Visual Tracking Via Multi-Scale Spatio-Temporal Context Learning

Wanli Xue,Chao Xu,Zhiyong Feng
DOI: https://doi.org/10.1109/tcsvt.2017.2720749
IF: 5.859
2018-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:In order to tackle the incomplete and inaccurate of the samples in most tracking-by-detection algorithms, this paper presents an object tracking algorithm, termed as multi-scale spatio-temporal context (MSTC) learning tracking. MSTC collaboratively explores three different types of spatio-temporal contexts, named the long-term historical targets, the medium-term stable scene (i.e., a short continuous and stable video sequence), and the short-term overall samples to improve the tracking efficiency and reduce the drift phenomenon. Different from conventional multi-timescale tracking paradigm that chooses samples in a fixed manner, MSTC formulates a low-dimensional representation named fast perceptual hash algorithm to update long-term historical targets and the medium-term stable scene dynamically with image similarity. MSTC also differs from most tracking-by-detection algorithms that label samples as positive or negative, it investigates a fusion salient sample detection to fuse weights of the samples not only by the distance information, but also by the visual spatial attention, such as color, intensity, and texture. Numerous experimental evaluations with most state-of-the-art algorithms on the standard 50 video benchmark demonstrate the superiority of the proposed algorithm.
What problem does this paper attempt to address?