Abstract:Current 3D single object tracking methods are typically based on VoteNet, a 3D region proposal network. Despite the success, using a single seed point feature as the cue for offset learning in VoteNet prevents high-quality 3D proposals from being generated. Moreover, seed points with different importance are treated equally in the voting process, aggravating this defect. To address these issues, we propose a novel global-local transformer voting scheme to provide more informative cues and guide the model pay more attention on potential seed points, promoting the generation of high-quality 3D proposals. Technically, a global-local transformer (GLT) module is employed to integrate object- and patch-aware prior into seed point features to effectively form strong feature representation for geometric positions of the seed points, thus providing more robust and accurate cues for offset learning. Subsequently, a simple yet effective training strategy is designed to train the GLT module. We develop an importance prediction branch to learn the potential importance of the seed points and treat the output weights vector as a training constraint term. By incorporating the above components together, we exhibit a superior tracking method GLT-T. Extensive experiments on challenging KITTI and NuScenes benchmarks demonstrate that GLT-T achieves state-of-the-art performance in the 3D single object tracking task. Besides, further ablation studies show the advantages of the proposed global-local transformer voting scheme over the original VoteNet. Code and models will be available at <a class="link-external link-https" href="https://github.com/haooozi/GLT-T" rel="external noopener nofollow">this https URL</a>.

STGL: Spatial-Temporal Graph Representation and Learning for Visual Tracking

RASTMTrack: Robust and Adaptive Space-Time Memory Networks for Visual Tracking

Track Without Appearance: Learn Box and Tracklet Embedding with Local and Global Motion Patterns for Vehicle Tracking

Robust visual tracking via weighted incremental subspace learning

Joint Spatio-Temporal Similarity and Discrimination Learning for Visual Tracking

TLPG-Tracker: Joint Learning of Target Localization and Proposal Generation for Visual Tracking.

Visual Tracking Via Locally Structured Gaussian Process Regression

Tracking in Multimedia Data Via Robust Reweighted Local Multi-Task Sparse Representation for Transportation Surveillance

Temporal Coherent and Graph Optimized Manifold Ranking for Visual Tracking

Joint spatio-temporal modeling for visual tracking

A Compensatory Algorithm for High-Speed Visual Object Tracking Based on Markov Chain

Describe and Attend to Track: Learning Natural Language guided Structural Representation and Visual Attention for Object Tracking

FGAGT: Flow-Guided Adaptive Graph Tracking.

Robust visual tracking based on generative and discriminative model collaboration

Real Time Visual Tracking using Spatial-Aware Temporal Aggregation Network

Autoregressive Queries for Adaptive Tracking with Spatio-TemporalTransformers

GLT-T: Global-Local Transformer Voting for 3D Single Object Tracking in Point Clouds

Tracklets Predicting Based Adaptive Graph Tracking

Learning background-aware and spatial-temporal regularized correlation filters for visual tracking

TGLC: Visual object tracking by fusion of global-local information and channel information

Graph-Regularized Structured Support Vector Machine for Object Tracking