Abstract:Tracking-by-detection (TBD) methods achieve state-of-the-art performance on 3D tracking benchmarks for autonomous driving. On the other hand, tracking-by-attention (TBA) methods have the potential to outperform TBD methods, particularly for long occlusions and challenging detection settings. This work investigates why TBA methods continue to lag in performance behind TBD methods using a LiDAR-based joint detector and tracker called JDT3D. Based on this analysis, we propose two generalizable methods to bridge the gap between TBD and TBA methods: track sampling augmentation and confidence-based query propagation. JDT3D is trained and evaluated on the nuScenes dataset, achieving 0.574 on the AMOTA metric on the nuScenes test set, outperforming all existing LiDAR-based TBA approaches by over 6%. Based on our results, we further discuss some potential challenges with the existing TBA model formulation to explain the continued gap in performance with TBD methods. The implementation of JDT3D can be found at the following link: <a class="link-external link-https" href="https://github.com/TRAILab/JDT3D" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is **why the LiDAR - based attention - based multi - object tracking (TBA) method lags behind the detection - based tracking (TBD) method in performance**. Specifically, the author explores and attempts to bridge the performance gap between these two methods by developing a new LiDAR - based attention - based joint detection and tracking model JDT3D. ### Problem Background 1. **Importance of Multi - Object Tracking (MOT)**: - In the perception system of autonomous vehicles, multi - object tracking is a key component, which allows autonomous agents to reason and plan in a dynamic environment. - The task of MOT is to identify object trajectories, requiring each object to be accurately detected in the scene and consistently identified over time. 2. **Limitations of Existing Methods**: - Most MOT methods follow the tracking - by - detection (TBD) paradigm, that is, first independently detect objects in a single frame, and then associate these detection results with existing trajectories. - TBD methods currently perform best in 3D tracking benchmark tests, but they cannot fully utilize temporal information to enhance detection performance, and detection errors will propagate to the tracking task. - Joint detection and tracking (JDT) methods attempt to unify detection and tracking tasks in an end - to - end manner, but their performance in the LiDAR field is still not as good as that of TBD methods. ### Core Contributions of the Paper 1. **Proposing the JDT3D Model**: - JDT3D is a LiDAR - based TBA model, aiming to improve the performance of TBA methods by introducing two general methods - trajectory sampling enhancement and confidence - based query propagation. 2. **Trajectory Sampling Enhancement**: - To solve the problem of sparse supervision signals in LiDAR data, the author proposes the trajectory sampling enhancement technique. This technique enriches the supervision signal by injecting consistent objects in multiple LiDAR frames while maintaining temporal consistency. 3. **Confidence - Based Query Propagation**: - The existing TBA methods have inconsistent query propagation methods in the training and inference stages, which may cause the model to over - trust false positive sample queries. For this reason, the author proposes a confidence - based query propagation strategy to ensure the consistency of training and inference. ### Experimental Results - The AMOTA index of JDT3D on the nuScenes test set reaches 0.574, which is more than 6% higher than the existing LiDAR - based TBA methods. - Through ablation experiments, the author verifies the effectiveness and generalization ability of trajectory sampling enhancement and confidence - based query propagation. ### Summary This paper deeply explores the performance bottlenecks of LiDAR - based attention - based multi - object tracking methods by proposing the JDT3D model and its improvement methods, and provides a series of effective solutions, significantly improving the performance of TBA methods.

JDT3D: Addressing the Gaps in LiDAR-Based Tracking-by-Attention

Object tracking with 3D LIDAR via multi-task sparse learning

Exploit Spatiotemporal Contextual Information for 3D Single Object Tracking Via Memory Networks

Object-Level Pseudo-3D Lifting for Distance-Aware Tracking

Tracking by Joint Local and Global Search: A Target-Aware Attention-Based Approach

Spb3DTracker: A Robust LiDAR-Based Person Tracker for Noisy Environment

Sparse4D v3: Advancing End-to-End 3D Detection and Tracking

ShaSTA: Modeling Shape and Spatio-Temporal Affinities for 3D Multi-Object Tracking

3DMODT: Attention-Guided Affinities for Joint Detection & Tracking in 3D Point Clouds

Motion-to-Matching: A Mixed Paradigm for 3D Single Object Tracking

Probabilistic 3D Multi-Modal, Multi-Object Tracking for Autonomous Driving

SeqTrack3D: Exploring Sequence Information for Robust 3D Point Cloud Tracking

ADA-Track: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association

Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds

ByteTrackV2: 2D and 3D Multi-Object Tracking by Associating Every Detection Box

Once Detected, Never Lost: Surpassing Human Performance in Offline LiDAR based 3D Object Detection

A Lightweight and Detector-Free 3D Single Object Tracker on Point Clouds

MCTrack: A Unified 3D Multi-Object Tracking Framework for Autonomous Driving

An Effective Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds

3D LiDAR Multi-Object Tracking with Short-Term and Long-Term Multi-Level Associations

S.T.A.R.-Track: Latent Motion Models for End-to-End 3D Object Tracking with Adaptive Spatio-Temporal Appearance Representations