Abstract:Convolutional Neural Networks (CNNs) have shown outstanding performance in visual object tracking. However, most of classification-based tracking methods using CNNs are time-consuming due to expensive computation of complex online fine-tuning and massive feature extractions. Besides, these methods suffer from the problem of over-fitting since the training and testing stages of CNN models are based on the videos from the same domain. Recently, matching-based tracking methods (such as Siamese networks) have shown remarkable speed superiority, while they cannot well address target appearance variations and complex scenes for inherent lack of online adaptability and background information. In this paper, we propose a novel object-adaptive LSTM network, which can effectively exploit sequence dependencies and dynamically adapt to the temporal object variations via constructing an intrinsic model for object appearance and motion. In addition, we develop an efficient strategy for proposal selection, where the densely sampled proposals are firstly pre-evaluated using the fast matching-based method and then the well-selected high-quality proposals are fed to the sequence-specific learning LSTM network. This strategy enables our method to adaptively track an arbitrary object and operate faster than conventional CNN-based classification tracking methods. To the best of our knowledge, this is the first work to apply an LSTM network for classification in visual object tracking. Experimental results on OTB and TC-128 benchmarks show that the proposed method achieves state-of-the-art performance, which exhibits great potentials of recurrent structures for visual object tracking.

MP-LN: Motion State Prediction and Localization Network for Visual Object Tracking

Track Without Appearance: Learn Box and Tracklet Embedding with Local and Global Motion Patterns for Vehicle Tracking

RASTMTrack: Robust and Adaptive Space-Time Memory Networks for Visual Tracking

TLPG-Tracker: Joint Learning of Target Localization and Proposal Generation for Visual Tracking.

Enhanced Multi-Object Tracking: Inferring Motion States of Tracked Objects

MotionTrack: Learning Motion Predictor for Multiple Object Tracking

Robust Visual Tracking Via Multiple Discriminative Models with Object Proposals

Deep Location-Specific Tracking.

Real-time Visual Object Tracking with Natural Language Description

Visual Tracking Based on Multi-cue Proposals and Long Short-Term Features Learning.

Prediction-Decision Network For Video Object Tracking

Object-Adaptive LSTM Network for Visual Tracking

LGTrack: Exploiting Local and Global Properties for Robust Visual Tracking

Learning to Track by Bi-Directional Long Short-Term Memory Networks.

Learning Motion-Perceive Siamese network for robust visual object tracking

Deep Spatial and Temporal Network for Robust Visual Object Tracking

Target-Aware State Estimation for Visual Tracking

Learning Motion-Aware Policies for Robust Visual Tracking

MotionTrack: Learning Robust Short-term and Long-term Motions for Multi-Object Tracking

Multi-attention Associate Prediction Network for Visual Tracking

End-to-end Visual Object Tracking with Motion Saliency Guidance