Abstract:Monocular 3D object detection is a challenging task in the self-driving and computer vision community. As a common practice, most previous works use manually annotated 3D box labels, where the annotating process is expensive. In this paper, we find that the precisely and carefully annotated labels may be unnecessary in monocular 3D detection, which is an interesting and counterintuitive finding. Using rough labels that are randomly disturbed, the detector can achieve very close accuracy compared to the one using the ground-truth labels. We delve into this underlying mechanism and then empirically find that: concerning the label accuracy, the 3D location part in the label is preferred compared to other parts of labels. Motivated by the conclusions above and considering the precise LiDAR 3D measurement, we propose a simple and effective framework, dubbed LiDAR point cloud guided monocular 3D object detection (LPCG). This framework is capable of either reducing the annotation costs or considerably boosting the detection accuracy without introducing extra annotation costs. Specifically, It generates pseudo labels from unlabeled LiDAR point clouds. Thanks to accurate LiDAR 3D measurements in 3D space, such pseudo labels can replace manually annotated labels in the training of monocular 3D detectors, since their 3D location information is precise. LPCG can be applied into any monocular 3D detector to fully use massive unlabeled data in a selfdriving system. As a result, in KITTI benchmark, we take the first place on both monocular 3D and BEV (bird's-eye-view) detection with a significant margin. In Waymo benchmark, our method using 10% labeled data achieves comparable accuracy to the baseline detector using 100% labeled data. The codes are released at https://github.com/SPengLiang/LPCG.

Learning Occupancy for Monocular 3D Object Detection

Learning Occupancy for Monocular 3D Object Detection

Leveraging Front and Side Cues for Occlusion Handling in Monocular 3D Object Detection

Towards Flexible 3D Perception: Object-Centric Occupancy Completion Augments 3D Object Detection

Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving

SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving

OCM3D: Object-Centric Monocular 3D Object Detection

ODM3D: Alleviating Foreground Sparsity for Semi-Supervised Monocular 3D Object Detection

Learning-based 3D Occupancy Prediction for Autonomous Navigation in Occluded Environments

A Simple Framework for 3D Occupancy Estimation in Autonomous Driving

Fully Sparse 3D Occupancy Prediction

OccupancyDETR: Using DETR for Mixed Dense-sparse 3D Occupancy Prediction

MonoOcc: Digging into Monocular Semantic Occupancy Prediction

SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction

Real-Time 3D Occupancy Prediction via Geometric-Semantic Disentanglement

Training an Open-Vocabulary Monocular 3D Object Detection Model without 3D Data

OccLoff: Learning Optimized Feature Fusion for 3D Occupancy Prediction

Monocular Occupancy Prediction for Scalable Indoor Scenes

Lidar Point Cloud Guided Monocular 3D Object Detection