Abstract:Accurate 3D object detection in real-world environments requires a huge amount of annotated data with high quality. Acquiring such data is tedious and expensive, and often needs repeated effort when a new sensor is adopted or when the detector is deployed in a new environment. We investigate a new scenario to construct 3D object detectors: learning from the predictions of a nearby unit that is equipped with an accurate detector. For example, when a self-driving car enters a new area, it may learn from other traffic participants whose detectors have been optimized for that area. This setting is label-efficient, sensor-agnostic, and communication-efficient: nearby units only need to share the predictions with the ego agent (e.g., car). Naively using the received predictions as ground-truths to train the detector for the ego car, however, leads to inferior performance. We systematically study the problem and identify viewpoint mismatches and mislocalization (due to synchronization and GPS errors) as the main causes, which unavoidably result in false positives, false negatives, and inaccurate pseudo labels. We propose a distance-based curriculum, first learning from closer units with similar viewpoints and subsequently improving the quality of other units' predictions via self-training. We further demonstrate that an effective pseudo label refinement module can be trained with a handful of annotated data, largely reducing the data quantity necessary to train an object detector. We validate our approach on the recently released real-world collaborative driving dataset, using reference cars' predictions as pseudo labels for the ego car. Extensive experiments including several scenarios (e.g., different sensors, detectors, and domains) demonstrate the effectiveness of our approach toward label-efficient learning of 3D perception from other units' predictions.

Just Label What You Need: Fine-Grained Active Selection for Perception and Prediction through Partially Labeled Scenes

ActiveAD: Planning-Oriented Active Learning for End-to-End Autonomous Driving

Learning 3D Perception from Others' Predictions

Learning to Label with Active Learning and Reinforcement Learning.

Label and Sample: Efficient Training of Vehicle Object Detector from Sparsely Labeled Data

Interactive Prediction and Decision-Making for Autonomous Vehicles: Online Active Learning with Traffic Entropy Minimization

An Active and Contrastive Learning Framework for Fine-Grained Off-Road Semantic Segmentation

Learning predictive representations in autonomous driving to improve deep reinforcement learning

Perception Without Vision for Trajectory Prediction: Ego Vehicle Dynamics as Scene Representation for Efficient Active Learning in Autonomous Driving

ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal Feature Learning

LabelFormer: Object Trajectory Refinement for Offboard Perception from LiDAR Point Clouds

Pseudo-labeling for Scalable 3D Object Detection

Training Data Subset Search With Ensemble Active Learning

Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data Augmentation

DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving

Label Efficient Visual Abstractions for Autonomous Driving

Active Data Acquisition in Autonomous Driving Simulation

Active Learning of Robot Vision Using Adaptive Path Planning

One-vs-All Semi-Automatic Labeling Tool for Semantic Segmentation in Autonomous Driving

Learning Occlusion-aware Decision-making from Agent Interaction via Active Perception