Abstract:Learning discriminative target-specific feature representation for object localization is the core of the 3D Siamese object tracking algorithms. Current Siamese trackers focus on aggregating the target information from the latest template into the search area for target-specific feature construction, which presents the limited performance in the case of object occlusion or object missing. To this end, in this paper, we propose a novel temporal-aware Siamese tracking framework, where the rich target clue lying in a set of historical templates is integrated into the search area for reliable target-specific feature aggregation. Specifically, our method consists of three modules, including a template set sampling module, a temporal feature enhancement module and a temporal-aware feature aggregation module. In the template set sampling module, an effective scoring network is proposed to evaluate the tracking quality of the template so that the high-quality templates are collected to form the historical template set. Then, with the initial feature embeddings of the historical templates, the temporal feature enhancement module concatenates all template embeddings as a whole and then feeds them into a linear attention module for cross-template feature enhancement. Furthermore, the temporal-aware feature aggregation module aggregates the target clue lying in each template into the search area to construct multiple historical target-specific search-area features. Particularly, we follow the collection orders of the templates to fuse all generated target-specific features via an RNN-based module so that the fusion weight of the previous template information can be discounted to better fit the current tracking state. Finally, we feed the temporal fused target-specific feature into a modified CenterPoint detection head for target position regression. Extensive experiments on KITTI, NuScenes and waymo open datasets show the effectiveness of our proposed method. Source code is available at https://github.com/tqsdyy/TAT.

TaCoTrack: Tracking Object with Temporal Context

Exploit Spatiotemporal Contextual Information for 3D Single Object Tracking Via Memory Networks

RASTMTrack: Robust and Adaptive Space-Time Memory Networks for Visual Tracking

Towards Real-World Visual Tracking with Temporal Contexts

ODTrack: Online Dense Temporal Token Learning for Visual Tracking

Object Tracking via Spatial-Temporal Memory Network

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

Exploiting Temporal Coherence for Self-Supervised Visual Tracking by Using Vision Transformer

Joint spatio-temporal modeling for visual tracking

ACTrack: Adding Spatio-Temporal Condition for Visual Object Tracking

STTracker: Spatio-Temporal Tracker for 3D Single Object Tracking

Modeling of Multiple Spatial-Temporal Relations for Robust Visual Object Tracking

Exploring reliable infrared object tracking with spatio-temporal fusion transformer

Temporal-Aware Siamese Tracker: Integrate Temporal Context for 3D Object Tracking

Real Time Visual Tracking using Spatial-Aware Temporal Aggregation Network

Dynamic memory network with spatial-temporal feature fusion for visual tracking

Temporal Correlation Meets Embedding: Towards a 2nd Generation of JDE-based Real-Time Multi-Object Tracking

TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video

CXTrack: Improving 3D Point Cloud Tracking with Contextual Information

Multi-Task Structure-Aware Context Modeling for Robust Keypoint-Based Object Tracking