Learning 3D Perception from Others' Predictions

Jinsu Yoo,Zhenyang Feng,Tai-Yu Pan,Yihong Sun,Cheng Perng Phoo,Xiangyu Chen,Mark Campbell,Kilian Q. Weinberger,Bharath Hariharan,Wei-Lun Chao
2024-10-05
Abstract:Accurate 3D object detection in real-world environments requires a huge amount of annotated data with high quality. Acquiring such data is tedious and expensive, and often needs repeated effort when a new sensor is adopted or when the detector is deployed in a new environment. We investigate a new scenario to construct 3D object detectors: learning from the predictions of a nearby unit that is equipped with an accurate detector. For example, when a self-driving car enters a new area, it may learn from other traffic participants whose detectors have been optimized for that area. This setting is label-efficient, sensor-agnostic, and communication-efficient: nearby units only need to share the predictions with the ego agent (e.g., car). Naively using the received predictions as ground-truths to train the detector for the ego car, however, leads to inferior performance. We systematically study the problem and identify viewpoint mismatches and mislocalization (due to synchronization and GPS errors) as the main causes, which unavoidably result in false positives, false negatives, and inaccurate pseudo labels. We propose a distance-based curriculum, first learning from closer units with similar viewpoints and subsequently improving the quality of other units' predictions via self-training. We further demonstrate that an effective pseudo label refinement module can be trained with a handful of annotated data, largely reducing the data quantity necessary to train an object detector. We validate our approach on the recently released real-world collaborative driving dataset, using reference cars' predictions as pseudo labels for the ego car. Extensive experiments including several scenarios (e.g., different sensors, detectors, and domains) demonstrate the effectiveness of our approach toward label-efficient learning of 3D perception from other units' predictions.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to use the 3D object detection predictions provided by nearby units (such as other vehicles) to train the 3D object detector of the ego - car, thereby reducing or avoiding the need for a large amount of labeled data. Specifically, the paper explores a new scenario: constructing a 3D object detector by learning the prediction results of nearby units (such as other vehicles). ### Main Challenges 1. **Viewpoint Mismatch**: - Due to the different positions and viewpoints of the two vehicles, some objects may be visible in the view of one vehicle, but occluded or out of the field of view in the view of the other vehicle. This will lead to false positives and false negatives in the pseudo - labels. 2. **Mislocalization**: - Due to GPS errors and synchronization delays, the prediction results may have position deviations. For example, a communication delay of 0.1 seconds may lead to a localization error of 2.7 meters at a speed of 60 miles per hour, which seriously affects the learning effect of the detector. ### Solutions To solve these problems, the paper proposes a method named "Learning 3D Perception from Others' Predictions" (R&B - POP), which mainly includes the following steps: 1. **Improve the Quality of Pseudo - Labels**: - **Initial Filtering**: Remove prediction boxes that are too far away or have sparse point clouds. - **Bounding Box Refinement Module**: Train a bounding box refinement module with a small amount of manually labeled data or simulation data to correct the positions of prediction boxes. - **Coarse - to - Fine Sampling**: First sample candidate boxes in a larger range, and then further optimize in a smaller range to improve the localization accuracy. 2. **Distance - based Curriculum Learning**: - **Self - training**: First, use high - quality pseudo - labels of nearby vehicles for training, and then gradually expand to more distant vehicles to gradually improve the generalization ability of the detector. - **Distance - based Screening**: Set different confidence thresholds according to the distances between vehicles to ensure that high - quality pseudo - labels are used for subsequent training. ### Experimental Verification The paper conducted extensive experiments on the real - world cooperative driving data set to verify the effectiveness of this method. The results show that through these improvement measures, the model performance is significantly improved, and the average precision (AP) at IoU 0.5 is increased from 22% to 56.5%, and this improvement can be achieved with only 40 labeled frames. ### Summary This research introduces a new learning scenario. By using the prediction results of nearby vehicles, the 3D object detector of the ego - car can be efficiently trained, reducing the dependence on a large amount of labeled data, and solving the challenges brought by viewpoint mismatch and mislocalization.