CenterTube: Tracking Multiple 3D Objects with 4D Tubelets in Dynamic Point Clouds

Hao Liu,Yanni Ma,Qingyong Hu,Yulan Guo
DOI: https://doi.org/10.1109/tmm.2023.3241548
IF: 7.3
2023-01-01
IEEE Transactions on Multimedia
Abstract:3D Multi-Object Tracking (MOT) in dynamic point cloud sequences is a fundamental research problem for several downstream tasks such as motion planning and action recognition. Existing methods usually rely on the traditional tracking-by-detection (TBD) paradigm, which performs the tracking based on the results achieved by dedicated detectors. However, this two-stage framework usually cannot sufficiently exploit spatial-temporal information and end-to-end optimization, leading to sub-optimal tracking performance, especially when the object is partially or completely occluded. In this paper, we propose a joint detection and tracking framework named CenterTube for dynamic point cloud sequences. The key to our approach is to formulate the problem of multiple object trajectory predictions as 4D tubelet detections. In particular, the proposed CenterTube is composed of three head branches, including a center branch, a regression branch, and a movement branch for the estimation of object center, object size, instance movement, and frame interval, respectively. Additionally, a Tube BEV-IoU (TB-IoU) is also presented to link the generated clip-level tubelets and form the final tracks. Extensive experiments conducted on the KITTI-MOT and nuScenes datasets demonstrate that our model achieves competitive performances even if no ready-made detection results is adopted.
computer science, information systems,telecommunications, software engineering
What problem does this paper attempt to address?