PTT: Point-Track-Transformer Module for 3D Single Object Tracking in Point Clouds

Jiayao Shan,Sifan Zhou,Zheng Fang,Yubo Cui

DOI: https://doi.org/10.48550/arXiv.2108.06455

2021-10-07

Abstract:3D single object tracking is a key issue for robotics. In this paper, we propose a transformer module called Point-Track-Transformer (PTT) for point cloud-based 3D single object tracking. PTT module contains three blocks for feature embedding, position encoding, and self-attention feature computation. Feature embedding aims to place features closer in the embedding space if they have similar semantic information. Position encoding is used to encode coordinates of point clouds into high dimension distinguishable features. Self-attention generates refined attention features by computing attention weights. Besides, we embed the PTT module into the open-source state-of-the-art method P2B to construct PTT-Net. Experiments on the KITTI dataset reveal that our PTT-Net surpasses the state-of-the-art by a noticeable margin (~10%). Additionally, PTT-Net could achieve real-time performance (~40FPS) on NVIDIA 1080Ti GPU. Our code is open-sourced for the robotics community at <a class="link-external link-https" href="https://github.com/shanjiayao/PTT" rel="external noopener nofollow">this https URL</a>.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to perform 3D single - object tracking (3D SOT) in point - cloud data. Specifically, the authors note that existing 3D SOT methods mainly rely on RGB - D cameras, and these methods may fail in environments with visual degradation or illumination changes. In addition, although 3D LiDAR sensors are widely used in object - tracking tasks because they are insensitive to illumination changes and can more accurately capture geometric information directly, performing 3D SOT using only point clouds still faces challenges: 1. **Sparse and disordered point clouds**: This requires that the network must be permutation - invariant. 2. **3D object tracking requires estimating higher - dimensional spatial parameters** (e.g., x, y, z, w, h, l, ry), which requires more computational complexity than 2D visual tracking. 3. **Tracking non - rigid objects is more challenging**: For example, pedestrians, because it is difficult to extract stable features. To solve these problems, the authors propose a Transformer - based module, called Point - Track - Transformer (PTT), for 3D single - object tracking in point clouds. The PTT module consists of three parts: feature embedding, position encoding, and self - attention mechanism. Through these mechanisms, the PTT module can weight point - cloud features, thereby focusing on deep cues of the target during the tracking process. In addition, the authors embed the PTT module into the existing open - source advanced method P2B to construct a new network, PTT - Net. The experimental results show that the performance of PTT - Net on the KITTI dataset is significantly better than that of existing methods, and it can also achieve real - time performance (about 40FPS).

PTT: Point-Track-Transformer Module for 3D Single Object Tracking in Point Clouds

Real-Time 3D Single Object Tracking With Transformer

Multi-modal 3D Human Tracking for Robots in Complex Environment with Siamese Point-Video Transformer

PTTR: Relational 3D Point Cloud Object Tracking with Transformer

Instance-Guided Point Cloud Single Object Tracking With Inception Transformer

Beyond Traditional Driving Scenes: A Robotic-Centric Paradigm for 2D+3D Human Tracking Using Siamese Transformer Network

PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer

Exploring Point-BEV Fusion for 3D Point Cloud Object Tracking with Transformer

3D Object Tracking with Transformer

Implicit and Efficient Point Cloud Completion for 3D Single Object Tracking

P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds.

DeepPCT: Single Object Tracking in Dynamic Point Cloud Sequences

EasyTrack: Efficient and Compact One-stream 3D Point Clouds Tracker

TM2B: Transformer-Based Motion-to-Box Network for 3D Single Object Tracking on Point Clouds

GLT-T: Global-Local Transformer Voting for 3D Single Object Tracking in Point Clouds

OST: Efficient One-stream Network for 3D Single Object Tracking in Point Clouds

PointTrackNet: An End-to-End Network For 3-D Object Detection and Tracking From Point Clouds

Point Tree Transformer for Point Cloud Registration

CT3D++: Improving 3D Object Detection with Keypoint-induced Channel-wise Transformer

PillarTrack: Redesigning Pillar-based Transformer Network for Single Object Tracking on Point Clouds