Visual Tracking via Dynamic Memory Networks

Tianyu Yang,Antoni B. Chan
DOI: https://doi.org/10.48550/arXiv.1907.07613
2019-11-29
Abstract:Template-matching methods for visual tracking have gained popularity recently due to their good performance and fast speed. However, they lack effective ways to adapt to changes in the target object's appearance, making their tracking accuracy still far from state-of-the-art. In this paper, we propose a dynamic memory network to adapt the template to the target's appearance variations during tracking. The reading and writing process of the external memory is controlled by an LSTM network with the search feature map as input. A spatial attention mechanism is applied to concentrate the LSTM input on the potential target as the location of the target is at first unknown. To prevent aggressive model adaptivity, we apply gated residual template learning to control the amount of retrieved memory that is used to combine with the initial template. In order to alleviate the drift problem, we also design a "negative" memory unit that stores templates for distractors, which are used to cancel out wrong responses from the object template. To further boost the tracking performance, an auxiliary classification loss is added after the feature extractor part. Unlike tracking-by-detection methods where the object's information is maintained by the weight parameters of neural networks, which requires expensive online fine-tuning to be adaptable, our tracker runs completely feed-forward and adapts to the target's appearance changes by updating the external memory. Moreover, the capacity of our model is not determined by the network size as with other trackers --- the capacity can be easily enlarged as the memory requirements of a task increase, which is favorable for memorizing long-term object information. Extensive experiments on the OTB and VOT datasets demonstrate that our trackers perform favorably against state-of-the-art tracking methods while retaining real-time speed.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: Although the current template - matching - based methods in visual tracking are fast and have good performance, they lack an effective way to adapt to the changes in the appearance of the target object, resulting in the tracking accuracy being far from the state - of - the - art level. Specifically, these methods are difficult to effectively deal with the possible appearance changes of the target object during the tracking process, such as the influence of factors like illumination, pose, and occlusion, while maintaining speed. To solve this problem, the paper proposes a Dynamic Memory Networks, which adapts to the changes in the target appearance through external memory, thereby improving the tracking accuracy. This method can not only process video streams in real - time, but also significantly enhance the adaptability to the target appearance changes without sacrificing speed.