Abstract:Template-matching methods for visual tracking have gained popularity recently due to their good performance and fast speed. However, they lack effective ways to adapt to changes in the target object's appearance, making their tracking accuracy still far from state-of-the-art. In this paper, we propose a dynamic memory network to adapt the template to the target's appearance variations during tracking. The reading and writing process of the external memory is controlled by an LSTM network with the search feature map as input. A spatial attention mechanism is applied to concentrate the LSTM input on the potential target as the location of the target is at first unknown. To prevent aggressive model adaptivity, we apply gated residual template learning to control the amount of retrieved memory that is used to combine with the initial template. In order to alleviate the drift problem, we also design a "negative" memory unit that stores templates for distractors, which are used to cancel out wrong responses from the object template. To further boost the tracking performance, an auxiliary classification loss is added after the feature extractor part. Unlike tracking-by-detection methods where the object's information is maintained by the weight parameters of neural networks, which requires expensive online fine-tuning to be adaptable, our tracker runs completely feed-forward and adapts to the target's appearance changes by updating the external memory. Moreover, the capacity of our model is not determined by the network size as with other trackers --- the capacity can be easily enlarged as the memory requirements of a task increase, which is favorable for memorizing long-term object information. Extensive experiments on the OTB and VOT datasets demonstrate that our trackers perform favorably against state-of-the-art tracking methods while retaining real-time speed.

High Speed Recurrent Regression Network for Visual Tracking.

RASTMTrack: Robust and Adaptive Space-Time Memory Networks for Visual Tracking

Exploit Spatiotemporal Contextual Information for 3D Single Object Tracking Via Memory Networks

Memory network for tracking with deep regression

Learning Regression and Verification Networks for Robust Long-term Tracking

Learning Attentional Recurrent Neural Network for Visual Tracking

Online Multi-Target Tracking Using Recurrent Neural Networks

Real Time Visual Tracking using Spatial-Aware Temporal Aggregation Network

Siamese Residual Network for Efficient Visual Tracking

Recurrent Filter Learning for Visual Tracking

SMART: Joint Sampling and Regression for Visual Tracking.

Dynamic memory network with spatial-temporal feature fusion for visual tracking

Robust Visual Object Tracking with Two-Stream Residual Convolutional Networks

Dual Deep Network for Visual Tracking

Object-Adaptive LSTM Network for Real-time Visual Tracking with Adversarial Data Augmentation

Learning Spatio-Appearance Memory Network for High-Performance Visual Tracking

Unsupervised Deep Representation Learning for Real-Time Tracking

Visual Tracking via Dynamic Memory Networks

Learning Hierarchical Features for Visual Object Tracking with Recursive Neural Networks

Temporal Restricted Visual Tracking Via Reverse-Low-Rank Sparse Learning.

End-to-End Learning of Object Motion Estimation from Retinal Events for Event-Based Object Tracking