Learning Multiple Instance Deep Quality Representation for Robust Object Tracking.
Guan Wang,Jing Liu,Wei Lo,Chun-Ming Yang
DOI: https://doi.org/10.1016/j.future.2020.07.024
IF: 7.307
2020-01-01
Future Generation Computer Systems
Abstract:Robustly tracking various objects within a video stream with complex objects and backgrounds is a useful technique in next generation computer vision systems. However, in practice, it is difficult to design a successful video-based object tracking system due to the varied light conditions, possible occlusions, and fast-moving objects. In this work, a novel weakly-supervised and quality-guided visual object tracking model is proposed, wherein the key is a bidirectional long short-term memory recurrent neural network (BLSTM-RNN) that captures the feature sequence and predicts the quality score of each candidate window. More specifically, given a rich set of training videos annotated with the target objects, a weakly-supervised learning algorithm is first used to project all the candidate window features onto the semantic space. Next, we propose a two-stage algorithm to select the key frames from the video sequences, where both the shallow and deep filtering operations are conducted. Subsequently, the so-called BLSTM-RNN is proposed to characterize the feature sequence temporally, based on which the maximally possible object window can be calculated and finally output. In our experiment, a large video dataset containing 2045 NBA regular seasons and playoff basketball games was compiled. Based on this, a comparative study is conducted between the proposed algorithm and state-of-the-art video tracking methods. Extensive visualization results and comparative tracking precisions show the competitiveness of the proposed method.