Visual Tracking via Dynamic Memory Networks

Tianyu Yang,Antoni B. Chan

DOI: https://doi.org/10.48550/arXiv.1907.07613

2019-11-29

Abstract:Template-matching methods for visual tracking have gained popularity recently due to their good performance and fast speed. However, they lack effective ways to adapt to changes in the target object's appearance, making their tracking accuracy still far from state-of-the-art. In this paper, we propose a dynamic memory network to adapt the template to the target's appearance variations during tracking. The reading and writing process of the external memory is controlled by an LSTM network with the search feature map as input. A spatial attention mechanism is applied to concentrate the LSTM input on the potential target as the location of the target is at first unknown. To prevent aggressive model adaptivity, we apply gated residual template learning to control the amount of retrieved memory that is used to combine with the initial template. In order to alleviate the drift problem, we also design a "negative" memory unit that stores templates for distractors, which are used to cancel out wrong responses from the object template. To further boost the tracking performance, an auxiliary classification loss is added after the feature extractor part. Unlike tracking-by-detection methods where the object's information is maintained by the weight parameters of neural networks, which requires expensive online fine-tuning to be adaptable, our tracker runs completely feed-forward and adapts to the target's appearance changes by updating the external memory. Moreover, the capacity of our model is not determined by the network size as with other trackers --- the capacity can be easily enlarged as the memory requirements of a task increase, which is favorable for memorizing long-term object information. Extensive experiments on the OTB and VOT datasets demonstrate that our trackers perform favorably against state-of-the-art tracking methods while retaining real-time speed.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: Although the current template - matching - based methods in visual tracking are fast and have good performance, they lack an effective way to adapt to the changes in the appearance of the target object, resulting in the tracking accuracy being far from the state - of - the - art level. Specifically, these methods are difficult to effectively deal with the possible appearance changes of the target object during the tracking process, such as the influence of factors like illumination, pose, and occlusion, while maintaining speed. To solve this problem, the paper proposes a Dynamic Memory Networks, which adapts to the changes in the target appearance through external memory, thereby improving the tracking accuracy. This method can not only process video streams in real - time, but also significantly enhance the adaptability to the target appearance changes without sacrificing speed.

Visual Tracking via Dynamic Memory Networks

RASTMTrack: Robust and Adaptive Space-Time Memory Networks for Visual Tracking

Exploit Spatiotemporal Contextual Information for 3D Single Object Tracking Via Memory Networks

Dynamic memory network with spatial-temporal feature fusion for visual tracking

STMTrack: Template-free Visual Tracking with Space-time Memory Networks

Attention-Driven Memory Network for Online Visual Tracking.

Memory Network with Pixel-level Spatio-Temporal Learning for Visual Object Tracking

Memory network for tracking with deep regression

A joint local-global search mechanism for long-term tracking with dynamic memory network

Learning Spatio-Appearance Memory Network for High-Performance Visual Tracking

Learning Dynamic Compact Memory Embedding for Deformable Visual Object Tracking

Object Tracking via Spatial-Temporal Memory Network

Meta-Tracker: Fast and Robust Online Adaptation for Visual Object Trackers

DMV: Visual Object Tracking via Part-level Dense Memory and Voting-based Retrieval

Dynamic template updating Siamese network based on status feedback with quality evaluation for visual object tracking

Object-Adaptive LSTM Network for Real-time Visual Tracking with Adversarial Data Augmentation

Memory Mechanisms for Discriminative Visual Tracking Algorithms with Deep Neural Networks.

Discriminative Segmentation Tracking Using Dual Memory Banks

One-stream Vision-Language Memory Network for Object Tracking

Learning Recurrent Memory Activation Networks for Visual Tracking

Reading Relevant Feature from Global Representation Memory for Visual Object Tracking