Abstract:A memory mechanism has attracted growing popularity in tracking tasks due to the ability of learning long-term-dependent information. However, it is very challenging for existing memory modules to provide the intrinsic attribute information of the target to the tracker in complex scenes. In this article, by considering the biological visual memory mechanisms, we propose the novel online tracking method via an attention-driven memory network, which can mine discriminative memory information and enhance the robustness and reliability of the tracker. First, to reinforce effectiveness of memory content, we design a novel attention-driven memory network. In the network, the long memory module gains property-level memory information by focusing on the state of the target at both the channel and spatial levels. Meanwhile, in reciprocity, we add a short-term memory module to maintain good adaptability when confronting drastic deformation of the target. The attention-driven memory network can adaptively adjust the contribution of short-term and long-term memories to tracking results under the weighted gradient harmonized loss. On this basis, to avoid model performance degradation, an online memory updater (MU) is further proposed. It is designed to mining for target information in tracking results through the Mixer layer and the online head network together. By evaluating the confidence of the tracking results, the memory updater can accurately judge the time of updating the model, which guarantees the effectiveness of online memory updates. Finally, the proposed method performs favorably and has been extensively validated on several benchmark datasets, including object tracking benchmark-50/100 (OTB-50/100), temple color-128 (TC-128), unmanned aerial vehicles-123 (UAV-123), generic object tracking -10k (GOT-10k), visual object tracking-2016 (VOT-2016), and VOT-2018 against several advanced methods.

Attention-based Gating Network for Robust Segmentation Tracking

Multi Feature Representation and Aggregation Network for Accurate and Robust Visual Tracking.

Toward Accurate Pixelwise Object Tracking via Attention Retrieval

Chained-Tracker: Chaining Paired Attentive Regression Results for End-to-End Joint Multiple-Object Detection and Tracking

Graph Attention Tracking

Learning Rich Feature Representation and Aggregation for Accurate Visual Tracking

Graph Attention Network for Context-Aware Visual Tracking

Spatial graph attention network-based object tracking with adaptive cosine window

Towards Diverse Binary Segmentation via a Simple yet General Gated Network

Learning Spatio-Appearance Memory Network for High-Performance Visual Tracking

Tracking by Joint Local and Global Search: A Target-Aware Attention-Based Approach

Exploiting Weak Mask Representation with Convolutional Neural Networks for Accurate Object Tracking.

Robust Visual Tracking by Segmentation

Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation

Tracklets Predicting Based Adaptive Graph Tracking

Distractor-aware Siamese Networks for Visual Object Tracking

SiamMGT: robust RGBT tracking via graph attention and reliable modality weight learning

Attention-Driven Memory Network for Online Visual Tracking.

Siamese anchor-free object tracking with multiscale spatial attentions

A Discriminative Single-Shot Segmentation Network for Visual Object Tracking