Abstract:LiDAR-based 3D single object tracking is a challenging issue in robotics and autonomous driving. Currently, existing approaches usually suffer from the problem that objects at long distance often have very sparse or partially-occluded point clouds, which makes the features extracted by the model ambiguous. Ambiguous features will make it hard to locate the target object and finally lead to bad tracking results. To solve this problem, we utilize the powerful Transformer architecture and propose a Point-Track-Transformer (PTT) module for point cloud-based 3D single object tracking task. Specifically, PTT module generates fine-tuned attention features by computing attention weights, which guides the tracker focusing on the important features of the target and improves the tracking ability in complex scenarios. To evaluate our PTT module, we embed PTT into the dominant method and construct a novel 3D SOT tracker named PTT-Net. In PTT-Net, we embed PTT into the voting stage and proposal generation stage, respectively. PTT module in the voting stage could model the interactions among point patches, which learns context-dependent features. Meanwhile, PTT module in the proposal generation stage could capture the contextual information between object and background. We evaluate our PTT-Net on KITTI and NuScenes datasets. Experimental results demonstrate the effectiveness of PTT module and the superiority of PTT-Net, which surpasses the baseline by a noticeable margin, $\sim$10% in the Car category. Meanwhile, our method also has a significant performance improvement in sparse scenarios. In general, the combination of transformer and tracking pipeline enables our PTT-Net to achieve state-of-the-art performance on both two datasets. Additionally, PTT-Net could run in real-time at 40FPS on NVIDIA 1080Ti GPU. Our code is open-sourced for the research community at https://github.com/shanjiayao/PTT.

Multi-modal 3D Human Tracking for Robots in Complex Environment with Siamese Point-Video Transformer

Beyond Traditional Driving Scenes: A Robotic-Centric Paradigm for 2D+3D Human Tracking Using Siamese Transformer Network

Accurate and Real-Time 3-D Tracking for the Following Robots by Fusing Vision and Ultrasonar Information

Multi-features Guided Robust Visual Tracking.

Real-time 3D Human Tracking for Mobile Robots with Multisensors

Multi-person Multi-Camera Tracking for Live Stream Videos Based on Improved Motion Model and Matching Cascade

InterTrack: Interaction Transformer for 3D Multi-Object Tracking

Object-Level Pseudo-3D Lifting for Distance-Aware Tracking

Therapierefraktäre Angina pectoris im Endstadium der koronaren Herzkrankheit Neuromodulation als Chance?

Real-Time 3D Single Object Tracking With Transformer

Probabilistic 3D Multi-Modal, Multi-Object Tracking for Autonomous Driving

Robot Person Following in Uniform Crowd Environment

Audio-Visual Variational Fusion for Multi-Person Tracking with Robots

3D-SiamRPN: An End-to-End Learning Method for Real-Time 3D Single Object Tracking Using Raw Point Cloud

Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds

Optical coherence tomography findings in multiple evanescent white dot syndrome.

CXTrack: Improving 3D Point Cloud Tracking with Contextual Information

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

Multi-Correlation Siamese Transformer Network with Dense Connection for 3D Single Object Tracking

Modeling of Multiple Spatial-Temporal Relations for Robust Visual Object Tracking

Real-time 3D Deep Multi-Camera Tracking