JDT3D: Addressing the Gaps in LiDAR-Based Tracking-by-Attention

Brian Cheong,Jiachen Zhou,Steven Waslander
2024-07-16
Abstract:Tracking-by-detection (TBD) methods achieve state-of-the-art performance on 3D tracking benchmarks for autonomous driving. On the other hand, tracking-by-attention (TBA) methods have the potential to outperform TBD methods, particularly for long occlusions and challenging detection settings. This work investigates why TBA methods continue to lag in performance behind TBD methods using a LiDAR-based joint detector and tracker called JDT3D. Based on this analysis, we propose two generalizable methods to bridge the gap between TBD and TBA methods: track sampling augmentation and confidence-based query propagation. JDT3D is trained and evaluated on the nuScenes dataset, achieving 0.574 on the AMOTA metric on the nuScenes test set, outperforming all existing LiDAR-based TBA approaches by over 6%. Based on our results, we further discuss some potential challenges with the existing TBA model formulation to explain the continued gap in performance with TBD methods. The implementation of JDT3D can be found at the following link: <a class="link-external link-https" href="https://github.com/TRAILab/JDT3D" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is **why the LiDAR - based attention - based multi - object tracking (TBA) method lags behind the detection - based tracking (TBD) method in performance**. Specifically, the author explores and attempts to bridge the performance gap between these two methods by developing a new LiDAR - based attention - based joint detection and tracking model JDT3D. ### Problem Background 1. **Importance of Multi - Object Tracking (MOT)**: - In the perception system of autonomous vehicles, multi - object tracking is a key component, which allows autonomous agents to reason and plan in a dynamic environment. - The task of MOT is to identify object trajectories, requiring each object to be accurately detected in the scene and consistently identified over time. 2. **Limitations of Existing Methods**: - Most MOT methods follow the tracking - by - detection (TBD) paradigm, that is, first independently detect objects in a single frame, and then associate these detection results with existing trajectories. - TBD methods currently perform best in 3D tracking benchmark tests, but they cannot fully utilize temporal information to enhance detection performance, and detection errors will propagate to the tracking task. - Joint detection and tracking (JDT) methods attempt to unify detection and tracking tasks in an end - to - end manner, but their performance in the LiDAR field is still not as good as that of TBD methods. ### Core Contributions of the Paper 1. **Proposing the JDT3D Model**: - JDT3D is a LiDAR - based TBA model, aiming to improve the performance of TBA methods by introducing two general methods - trajectory sampling enhancement and confidence - based query propagation. 2. **Trajectory Sampling Enhancement**: - To solve the problem of sparse supervision signals in LiDAR data, the author proposes the trajectory sampling enhancement technique. This technique enriches the supervision signal by injecting consistent objects in multiple LiDAR frames while maintaining temporal consistency. 3. **Confidence - Based Query Propagation**: - The existing TBA methods have inconsistent query propagation methods in the training and inference stages, which may cause the model to over - trust false positive sample queries. For this reason, the author proposes a confidence - based query propagation strategy to ensure the consistency of training and inference. ### Experimental Results - The AMOTA index of JDT3D on the nuScenes test set reaches 0.574, which is more than 6% higher than the existing LiDAR - based TBA methods. - Through ablation experiments, the author verifies the effectiveness and generalization ability of trajectory sampling enhancement and confidence - based query propagation. ### Summary This paper deeply explores the performance bottlenecks of LiDAR - based attention - based multi - object tracking methods by proposing the JDT3D model and its improvement methods, and provides a series of effective solutions, significantly improving the performance of TBA methods.