Abstract:A memory mechanism has attracted growing popularity in tracking tasks due to the ability of learning long-term-dependent information. However, it is very challenging for existing memory modules to provide the intrinsic attribute information of the target to the tracker in complex scenes. In this article, by considering the biological visual memory mechanisms, we propose the novel online tracking method via an attention-driven memory network, which can mine discriminative memory information and enhance the robustness and reliability of the tracker. First, to reinforce effectiveness of memory content, we design a novel attention-driven memory network. In the network, the long memory module gains property-level memory information by focusing on the state of the target at both the channel and spatial levels. Meanwhile, in reciprocity, we add a short-term memory module to maintain good adaptability when confronting drastic deformation of the target. The attention-driven memory network can adaptively adjust the contribution of short-term and long-term memories to tracking results under the weighted gradient harmonized loss. On this basis, to avoid model performance degradation, an online memory updater (MU) is further proposed. It is designed to mining for target information in tracking results through the Mixer layer and the online head network together. By evaluating the confidence of the tracking results, the memory updater can accurately judge the time of updating the model, which guarantees the effectiveness of online memory updates. Finally, the proposed method performs favorably and has been extensively validated on several benchmark datasets, including object tracking benchmark-50/100 (OTB-50/100), temple color-128 (TC-128), unmanned aerial vehicles-123 (UAV-123), generic object tracking -10k (GOT-10k), visual object tracking-2016 (VOT-2016), and VOT-2018 against several advanced methods.

RASTMTrack: Robust and Adaptive Space-Time Memory Networks for Visual Tracking

Exploit Spatiotemporal Contextual Information for 3D Single Object Tracking Via Memory Networks

STMTrack: Template-free Visual Tracking with Space-time Memory Networks

Memory Network with Pixel-level Spatio-Temporal Learning for Visual Object Tracking

Object Tracking via Spatial-Temporal Memory Network

Dynamic memory network with spatial-temporal feature fusion for visual tracking

Memory network for tracking with deep regression

Learning Spatio-Appearance Memory Network for High-Performance Visual Tracking

Attention-Driven Memory Network for Online Visual Tracking.

Visual Tracking via Dynamic Memory Networks

Real Time Visual Tracking using Spatial-Aware Temporal Aggregation Network

Beyond Local Search: Tracking Objects Everywhere with Instance-Specific Proposals

Spatio-Temporal Bi-directional Cross-frame Memory for Distractor Filtering Point Cloud Single Object Tracking

TF-SASM: Training-free Spatial-aware Sparse Memory for Multi-object Tracking

STMT: Spatio-temporal memory transformer for multi-object tracking

ODTrack: Online Dense Temporal Token Learning for Visual Tracking

Learning Dynamic Compact Memory Embedding for Deformable Visual Object Tracking

SimpleTrackV2: Rethinking the Timing Characteristics for Multi-Object Tracking

Toward Accurate Pixelwise Object Tracking via Attention Retrieval

Multi-Object Tracking and Segmentation with a Space-Time Memory Network

ST-TrackNet: A Multiple-Object Tracking Network Using Spatio-Temporal Information