Abstract:Weakly supervised 3D object detection for autonomous driving primarily focuses on cars because of their distinct rectangle boundaries and abundant instances. However, detecting categories with ambiguous rectangle boundaries and fewer instances than cars, such as pedestrians and cyclists, remains challenging with limited research. Ambiguity in rectangle boundaries presents significant difficulties in generating accurate 3D pseudo labels, while the scarcity of instances often leads to convergence issues during detector training. Pedestrians and cyclists are dense inside the 3D bounding boxes but sparse at corners and boundaries. Density is a practical clue to locate and discriminate pedestrians and cyclists in point clouds. This paper proposes a density-based 3D pseudo-label generation module(DPL-3D), addressing the challenges of ambiguous rectangle boundaries. Ambiguity rectangle boundaries will lead to poor pseudo-label quality. Therefore, By leveraging the density information of 3D points, our DPL-3D improves the accuracy and localization quality of the generated pseudo labels. It effectively segments background points, improving the estimation of pseudo labels’ location, dimension, and orientation. Few training samples always lead to local optima. Introducing multi-modal data in the detector network could enhance the constraints of objects’ features, but 2D images and 3D point clouds have a resolution gap. A motivation for dealing with the resolution gap is that neighboring regions with similar colors and textures in 2D images may exhibit spatial proximity in 3D space. Therefore, a multi-modal network driven by superpixel segmentation is introduced. This network enables effective discrimination between objects in 2D images and 3D point clouds, bridging the resolution gap and leveraging complementary features from both modalities. Experimental results on the KITTI dataset demonstrate the effectiveness of the proposed methods in addressing the challenges associated with weakly-supervised 3D object detection, particularly for categories with ambiguous rectangle boundaries and few instances.

Hardness-Aware Scene Synthesis for Semi-Supervised 3D Object Detection

Learning 3 D Scene Synthesis from Annotated RGB-D Images

3D-SSD: Learning Hierarchical Features from RGB-D Images for Amodal 3D Object Detection

Semi-Supervised 3d Object Detection Via Adaptive Pseudo-Labeling

Are Dense Labels Always Necessary for 3D Object Detection from Point Cloud?

Back to Reality: Learning Data-Efficient 3D Object Detector with Shape Guidance.

Hard-aware Instance Adaptive Self-training for Unsupervised Cross-domain Semantic Segmentation

MixSup: Mixed-grained Supervision for Label-efficient LiDAR-based 3D Object Detection

SS3D: Sparsely-Supervised 3D Object Detection from Point Cloud

SESS: Self-Ensembling Semi-Supervised 3D Object Detection

Enhancing Pseudo Label Quality for Pedestrian and Cyclist in Weakly Supervised 3D Object Detection

Hierarchical Supervision and Shuffle Data Augmentation for 3D Semi-Supervised Object Detection

General Geometry-aware Weakly Supervised 3D Object Detection

SSC3OD: Sparsely Supervised Collaborative 3D Object Detection from LiDAR Point Clouds

Not All Voxels Are Equal: Hardness-Aware Semantic Scene Completion with Self-Distillation

Improving Point Cloud Semantic Segmentation by Learning 3D Object Detection

Unsupervised Domain Adaptive Salient Object Detection Through Uncertainty-Aware Pseudo-Label Learning

Learning with Noisy Data for Semi-Supervised 3D Object Detection

Semi-supervised 3D Object Detection with PatchTeacher and PillarMix

PE-MCAT: Leveraging Image Sensor Fusion and Adaptive Thresholds for Semi-Supervised 3D Object Detection

Semi-supervised 3D Object Detection with Proficient Teachers.