Abstract:3D single object tracking is essential in autonomous driving and robotics. Existing methods often struggle with sparse and incomplete point cloud scenarios. To address these limitations, we propose a Multimodal-guided Virtual Cues Projection (MVCP) scheme that generates virtual cues to enrich sparse point clouds. Additionally, we introduce an enhanced tracker MVCTrack based on the generated virtual cues. Specifically, the MVCP scheme seamlessly integrates RGB sensors into LiDAR-based systems, leveraging a set of 2D detections to create dense 3D virtual cues that significantly improve the sparsity of point clouds. These virtual cues can naturally integrate with existing LiDAR-based 3D trackers, yielding substantial performance gains. Extensive experiments demonstrate that our method achieves competitive performance on the NuScenes dataset.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in the fields of autonomous driving and robotics, the existing 3D single - object tracking (3D SOT) methods perform poorly when dealing with sparse and incomplete point cloud scenes. Specifically, the point cloud data generated by LiDAR sensors is usually rather sparse, especially in the case of long distances or small objects, which leads to a decline in tracking performance. To solve these problems, the authors propose a Multimodal - guided Virtual Cues Projection (MVCP) scheme, which enriches the sparse point cloud by generating virtual cues through combining RGB images and LiDAR point cloud data. In addition, they also introduce an enhanced tracker MVCTrack, which operates based on the generated virtual cues. ### Main Problem Summary: 1. **Point Cloud Sparsity**: Existing methods have difficulty in handling sparse and incomplete point cloud data, especially in the case of long distances and small objects. 2. **Multi - modal Information Fusion**: How to effectively combine the dense semantic information in RGB images with LiDAR point cloud data to improve tracking performance. ### Solutions: - **MVCP Mechanism**: By extracting 2D detection results from RGB images and converting them into 3D virtual cues, the density and integrity of the point cloud are enhanced. - **MVCTrack Framework**: The generated virtual cues are combined with the original point cloud and provided as input to the 3D single - object tracking network to improve tracking performance. ### Experimental Verification: The paper has carried out extensive experiments on the large - scale nuScenes dataset, proving that the proposed method can significantly improve tracking performance in various scenarios, especially in sparse point cloud and long - distance target tracking. Through these improvements, MVCTrack not only improves the tracking accuracy of small objects and long - distance targets, but also shows its potential in practical application scenarios.

MVCTrack: Boosting 3D Point Cloud Tracking via Multimodal-Guided Virtual Cues

Exploit Spatiotemporal Contextual Information for 3D Single Object Tracking Via Memory Networks

PointSiamRCNN: Target-aware Voxel-based Siamese Tracker for Point Clouds

Multi-modal 3D Human Tracking for Robots in Complex Environment with Siamese Point-Video Transformer

Object tracking with 3D LIDAR via multi-task sparse learning

Beyond Traditional Driving Scenes: A Robotic-Centric Paradigm for 2D+3D Human Tracking Using Siamese Transformer Network

Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds

Object-Level Pseudo-3D Lifting for Distance-Aware Tracking

VoxelTrack: Exploring Voxel Representation for 3D Point Cloud Object Tracking

Integrating Scaling Strategy and Central Guided Voting for 3D Point Cloud Object Tracking

Multimodal Virtual Point 3D Detection

3D Siamese Voxel-to-BEV Tracker for Sparse Point Clouds

An Effective Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds

MMF-Track: Multi-modal Multi-level Fusion for 3D Single Object Tracking

Probabilistic 3D Multi-Modal, Multi-Object Tracking for Autonomous Driving

MCTrack: A Unified 3D Multi-Object Tracking Framework for Autonomous Driving

3D Siamese Voxel-to-BEV Tracker for Sparse Point Clouds.

Boosting 3D Object Detection by Simulating Multimodality on Point Clouds

MLVSNet: Multi-level Voting Siamese Network for 3D Visual Tracking

Motion-to-Matching: A Mixed Paradigm for 3D Single Object Tracking

3D Multi-Object Tracking in Point Clouds Based on Prediction Confidence-Guided Data Association