Abstract:Multiple-object tracking (MOT) is a crucial component in autonomous driving systems. However, inaccurate object detection is always the bottleneck for MOT. Most detectors are not designed to take the temporal information across consecutive frames into consideration. To take advantage of such information, we design a novel data representation, the spatio-temporal (ST) map, which collects a batch of detection results spatio-temporally, and we train a novel network, ST-TrackNet, to assign predicted track IDs to each positive detection across a sequence. With our ST map detection fed into the tracker, the correlation of objects between adjacent frames becomes prominent, which improves the performance of the tracker in the data association step. Moreover, the long-term trajectory in a sequence also helps to refine the detection results. We train and evaluate our network on the KITTI dataset, a CARLA simulation dataset, and a dataset recorded in a factory environment. Our approach generally achieves superior performance over the state-of-the-art. Note to Practitioners—We investigate the MOT problem in this paper. A spatio-temporal pipeline is proposed to provide a solution to this problem. Object detection results produced by off-the-shelf object detectors are used to form the proposed ST maps. In low signal-to-noise ratio (SNR) situations, our proposed framework can achieve more accurate and robust tracking results with more false-positives. Due to the simplicity and modular design of our framework, it can be applied directly after the detection stage to achieve the online tracking task. The proposed method is evaluated on several datasets, and the experimental results demonstrate its effectiveness. Our method can also be used for other autonomous driving applications, such as path planning and trajectory prediction.

Spatiotemporal adaptive attention 3D multiobject tracking for autonomous driving

Exploit Spatiotemporal Contextual Information for 3D Single Object Tracking Via Memory Networks

Online Multi-Object Tracking from A Bird's-Eye View by Fusion of Millimeter-Wave Radar and Vision

HSTrack: Bootstrap End-to-End Multi-Camera 3D Multi-object Tracking with Hybrid Supervision

3D Multiple Object Tracking with Multi-modal Fusion of Low-cost Sensors for Autonomous Driving.

Spatio-Temporal Bi-directional Cross-frame Memory for Distractor Filtering Point Cloud Single Object Tracking

Spatial-Semantic and Temporal Attention Mechanism-Based Online Multi-Object Tracking

ShaSTA-Fuse: Camera-LiDAR Sensor Fusion to Model Shape and Spatio-Temporal Affinities for 3D Multi-Object Tracking

ST-TrackNet: A Multiple-Object Tracking Network Using Spatio-Temporal Information

ShaSTA: Modeling Shape and Spatio-Temporal Affinities for 3D Multi-Object Tracking

CC-3DT: Panoramic 3D Object Tracking via Cross-Camera Fusion

AttentionTrack: Multiple Object Tracking in Traffic Scenarios Using Features Attention

MCTrack: A Unified 3D Multi-Object Tracking Framework for Autonomous Driving

Standing Between Past and Future: Spatio-Temporal Modeling for Multi-Camera 3D Multi-Object Tracking

Relation3DMOT: Exploiting Deep Affinity for 3D Multi-Object Tracking from View Aggregation

3D Multi-Object Tracking in Point Clouds Based on Prediction Confidence-Guided Data Association

ADA-Track: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association

MSA-MOT: Multi-Stage Association for 3D Multimodality Multi-Object Tracking

STT: Stateful Tracking with Transformers for Autonomous Driving