Abstract:Weakly supervised 3D object detection for autonomous driving primarily focuses on cars because of their distinct rectangle boundaries and abundant instances. However, detecting categories with ambiguous rectangle boundaries and fewer instances than cars, such as pedestrians and cyclists, remains challenging with limited research. Ambiguity in rectangle boundaries presents significant difficulties in generating accurate 3D pseudo labels, while the scarcity of instances often leads to convergence issues during detector training. Pedestrians and cyclists are dense inside the 3D bounding boxes but sparse at corners and boundaries. Density is a practical clue to locate and discriminate pedestrians and cyclists in point clouds. This paper proposes a density-based 3D pseudo-label generation module(DPL-3D), addressing the challenges of ambiguous rectangle boundaries. Ambiguity rectangle boundaries will lead to poor pseudo-label quality. Therefore, By leveraging the density information of 3D points, our DPL-3D improves the accuracy and localization quality of the generated pseudo labels. It effectively segments background points, improving the estimation of pseudo labels’ location, dimension, and orientation. Few training samples always lead to local optima. Introducing multi-modal data in the detector network could enhance the constraints of objects’ features, but 2D images and 3D point clouds have a resolution gap. A motivation for dealing with the resolution gap is that neighboring regions with similar colors and textures in 2D images may exhibit spatial proximity in 3D space. Therefore, a multi-modal network driven by superpixel segmentation is introduced. This network enables effective discrimination between objects in 2D images and 3D point clouds, bridging the resolution gap and leveraging complementary features from both modalities. Experimental results on the KITTI dataset demonstrate the effectiveness of the proposed methods in addressing the challenges associated with weakly-supervised 3D object detection, particularly for categories with ambiguous rectangle boundaries and few instances.

Move to See Better: Self-Improving Embodied Object Detection

View-to-Label: Multi-View Consistency for Self-Supervised 3D Object Detection

Look Around and Learn: Self-Training Object Detection by Exploration

Learning 3D Perception from Others' Predictions

Approaching Outside: Scaling Unsupervised 3D Object Detection from 2D Scene

Augment and Criticize: Exploring Informative Samples for Semi-Supervised Monocular 3D Object Detection

Dual-Perspective Knowledge Enrichment for Semi-Supervised 3D Object Detection

Back to Reality: Learning Data-Efficient 3D Object Detector with Shape Guidance.

Discovering Objects that Can Move

Aug3D-RPN: Improving Monocular 3D Object Detection by Synthetic Images with Virtual Depth

Weakly Supervised Monocular 3D Object Detection by Spatial-Temporal View Consistency

Self-supervisory Signals for Object Discovery and Detection

Towards Generalizable Multi-Camera 3D Object Detection via Perspective Debiasing

Embodied amodal recognition: Learning to move to perceive objects

3D Object Aided Self-Supervised Monocular Depth Estimation

Enhancing Pseudo Label Quality for Pedestrian and Cyclist in Weakly Supervised 3D Object Detection

Towards 3D Object Detection with 2D Supervision

An Empirical Study of Pseudo-Labeling for Image-based 3D Object Detection

Monocular 3D Object Detection with Motion Feature Distillation.

Enhance the 3D Object Detection With 2D Prior

3M3D: Multi-view, Multi-path, Multi-representation for 3D Object Detection