STGL: Spatial-Temporal Graph Representation and Learning for Visual Tracking

Bo Jiang,Yuan Zhang,Bin Luo,Xiaochun Cao,Jin Tang
DOI: https://doi.org/10.1109/tmm.2020.3008035
IF: 7.3
2021-01-01
IEEE Transactions on Multimedia
Abstract:Tracking-by-detection framework has been normally adopted in visual tracking methods. It aims to localize the visual target object with a bounding box. However, the bounding box is usually difficult to describe the target object accurately and thus easily introduces noisy background information, which usually degrades the final tracking results. Recently, weighted patch representation of the object has been shown very effectively for suppressing the undesirable background information and thus can obviously improve the tracking results. In this paper, we propose a novel Spatial-Temporal Graph representation and Learning (STGL) model to generate a kind of robust target representation for visual tracking problem. The main aspect of STGL is that it aims to exploit both spatial (within each frame) and temporal (between consecutive frames) structure of patches simultaneously in a unified graph representation and semi-supervised learning model. Comparing with existing works, STGL naturally exploits the learned representation of object in previous frame and thus can obtain the representation of object in current frame more accurately and robustly. A new ADMM algorithm is derived to solve the proposed STGL model. Based on the proposed object representation, we then adapt the structured SVM by introducing scale estimation to achieve object tracking. Extensive experiments show that our method outperforms the state-of-the-art patch based tracking methods on two standard benchmark datasets.
computer science, information systems,telecommunications, software engineering
What problem does this paper attempt to address?