Abstract:Accurate 3D object detection in real-world environments requires a huge amount of annotated data with high quality. Acquiring such data is tedious and expensive, and often needs repeated effort when a new sensor is adopted or when the detector is deployed in a new environment. We investigate a new scenario to construct 3D object detectors: learning from the predictions of a nearby unit that is equipped with an accurate detector. For example, when a self-driving car enters a new area, it may learn from other traffic participants whose detectors have been optimized for that area. This setting is label-efficient, sensor-agnostic, and communication-efficient: nearby units only need to share the predictions with the ego agent (e.g., car). Naively using the received predictions as ground-truths to train the detector for the ego car, however, leads to inferior performance. We systematically study the problem and identify viewpoint mismatches and mislocalization (due to synchronization and GPS errors) as the main causes, which unavoidably result in false positives, false negatives, and inaccurate pseudo labels. We propose a distance-based curriculum, first learning from closer units with similar viewpoints and subsequently improving the quality of other units' predictions via self-training. We further demonstrate that an effective pseudo label refinement module can be trained with a handful of annotated data, largely reducing the data quantity necessary to train an object detector. We validate our approach on the recently released real-world collaborative driving dataset, using reference cars' predictions as pseudo labels for the ego car. Extensive experiments including several scenarios (e.g., different sensors, detectors, and domains) demonstrate the effectiveness of our approach toward label-efficient learning of 3D perception from other units' predictions.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to use the 3D object detection predictions provided by nearby units (such as other vehicles) to train the 3D object detector of the ego - car, thereby reducing or avoiding the need for a large amount of labeled data. Specifically, the paper explores a new scenario: constructing a 3D object detector by learning the prediction results of nearby units (such as other vehicles). ### Main Challenges 1. **Viewpoint Mismatch**: - Due to the different positions and viewpoints of the two vehicles, some objects may be visible in the view of one vehicle, but occluded or out of the field of view in the view of the other vehicle. This will lead to false positives and false negatives in the pseudo - labels. 2. **Mislocalization**: - Due to GPS errors and synchronization delays, the prediction results may have position deviations. For example, a communication delay of 0.1 seconds may lead to a localization error of 2.7 meters at a speed of 60 miles per hour, which seriously affects the learning effect of the detector. ### Solutions To solve these problems, the paper proposes a method named "Learning 3D Perception from Others' Predictions" (R&B - POP), which mainly includes the following steps: 1. **Improve the Quality of Pseudo - Labels**: - **Initial Filtering**: Remove prediction boxes that are too far away or have sparse point clouds. - **Bounding Box Refinement Module**: Train a bounding box refinement module with a small amount of manually labeled data or simulation data to correct the positions of prediction boxes. - **Coarse - to - Fine Sampling**: First sample candidate boxes in a larger range, and then further optimize in a smaller range to improve the localization accuracy. 2. **Distance - based Curriculum Learning**: - **Self - training**: First, use high - quality pseudo - labels of nearby vehicles for training, and then gradually expand to more distant vehicles to gradually improve the generalization ability of the detector. - **Distance - based Screening**: Set different confidence thresholds according to the distances between vehicles to ensure that high - quality pseudo - labels are used for subsequent training. ### Experimental Verification The paper conducted extensive experiments on the real - world cooperative driving data set to verify the effectiveness of this method. The results show that through these improvement measures, the model performance is significantly improved, and the average precision (AP) at IoU 0.5 is increased from 22% to 56.5%, and this improvement can be achieved with only 40 labeled frames. ### Summary This research introduces a new learning scenario. By using the prediction results of nearby vehicles, the 3D object detector of the ego - car can be efficiently trained, reducing the dependence on a large amount of labeled data, and solving the challenges brought by viewpoint mismatch and mislocalization.

Learning 3D Perception from Others' Predictions

Pseudo-labeling for Scalable 3D Object Detection

Lidar Point Cloud Guided Monocular 3D Object Detection

Back to Reality: Learning Data-Efficient 3D Object Detector with Shape Guidance.

Label-Efficient 3D Object Detection For Road-Side Units

3D Object Visibility Prediction in Autonomous Driving

Learning Ego 3D Representation as Ray Tracing

Exploiting Playbacks in Unsupervised Domain Adaptation for 3D Object Detection

Learning to Predict the 3D Layout of a Scene

Just Label What You Need: Fine-Grained Active Selection for Perception and Prediction through Partially Labeled Scenes

Vision-Language Guidance for LiDAR-based Unsupervised 3D Object Detection

Enhancing Pseudo Label Quality for Pedestrian and Cyclist in Weakly Supervised 3D Object Detection

Move to See Better: Self-Improving Embodied Object Detection

Unsupervised 3D Perception with 2D Vision-Language Distillation for Autonomous Driving

Semi-supervised 3D Object Detection with Proficient Teachers.

3D Object Detection for Point Cloud in Virtual Driving Environment

Influence of Camera-LiDAR Configuration on 3D Object Detection for Autonomous Driving

Learning Clear Class Separation for Open-set 3D Detector in Autonomous Vehicle via Selective Forgetting

Motion Inspired Unsupervised Perception and Prediction in Autonomous Driving

Accurate Monocular Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving