MVCTrack: Boosting 3D Point Cloud Tracking via Multimodal-Guided Virtual Cues

Zhaofeng Hu,Sifan Zhou,Shibo Zhao,Zhihang Yuan
2024-12-04
Abstract:3D single object tracking is essential in autonomous driving and robotics. Existing methods often struggle with sparse and incomplete point cloud scenarios. To address these limitations, we propose a Multimodal-guided Virtual Cues Projection (MVCP) scheme that generates virtual cues to enrich sparse point clouds. Additionally, we introduce an enhanced tracker MVCTrack based on the generated virtual cues. Specifically, the MVCP scheme seamlessly integrates RGB sensors into LiDAR-based systems, leveraging a set of 2D detections to create dense 3D virtual cues that significantly improve the sparsity of point clouds. These virtual cues can naturally integrate with existing LiDAR-based 3D trackers, yielding substantial performance gains. Extensive experiments demonstrate that our method achieves competitive performance on the NuScenes dataset.
Computer Vision and Pattern Recognition,Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in the fields of autonomous driving and robotics, the existing 3D single - object tracking (3D SOT) methods perform poorly when dealing with sparse and incomplete point cloud scenes. Specifically, the point cloud data generated by LiDAR sensors is usually rather sparse, especially in the case of long distances or small objects, which leads to a decline in tracking performance. To solve these problems, the authors propose a Multimodal - guided Virtual Cues Projection (MVCP) scheme, which enriches the sparse point cloud by generating virtual cues through combining RGB images and LiDAR point cloud data. In addition, they also introduce an enhanced tracker MVCTrack, which operates based on the generated virtual cues. ### Main Problem Summary: 1. **Point Cloud Sparsity**: Existing methods have difficulty in handling sparse and incomplete point cloud data, especially in the case of long distances and small objects. 2. **Multi - modal Information Fusion**: How to effectively combine the dense semantic information in RGB images with LiDAR point cloud data to improve tracking performance. ### Solutions: - **MVCP Mechanism**: By extracting 2D detection results from RGB images and converting them into 3D virtual cues, the density and integrity of the point cloud are enhanced. - **MVCTrack Framework**: The generated virtual cues are combined with the original point cloud and provided as input to the 3D single - object tracking network to improve tracking performance. ### Experimental Verification: The paper has carried out extensive experiments on the large - scale nuScenes dataset, proving that the proposed method can significantly improve tracking performance in various scenarios, especially in sparse point cloud and long - distance target tracking. Through these improvements, MVCTrack not only improves the tracking accuracy of small objects and long - distance targets, but also shows its potential in practical application scenarios.