Abstract:This paper proposes an algorithm for automatically labeling 3D objects from 2D point or box prompts, especially focusing on applications in autonomous driving. Unlike previous arts, our auto-labeler predicts 3D shapes instead of bounding boxes and does not require training on a specific dataset. We propose a Segment, Lift, and Fit (SLF) paradigm to achieve this goal. Firstly, we segment high-quality instance masks from the prompts using the Segment Anything Model (SAM) and transform the remaining problem into predicting 3D shapes from given 2D masks. Due to the ill-posed nature of this problem, it presents a significant challenge as multiple 3D shapes can project into an identical mask. To tackle this issue, we then lift 2D masks to 3D forms and employ gradient descent to adjust their poses and shapes until the projections fit the masks and the surfaces conform to surrounding LiDAR points. Notably, since we do not train on a specific dataset, the SLF auto-labeler does not overfit to biased annotation patterns in the training set as other methods do. Thus, the generalization ability across different datasets improves. Experimental results on the KITTI dataset demonstrate that the SLF auto-labeler produces high-quality bounding box annotations, achieving an AP@0.5 IoU of nearly 90\%. Detectors trained with the generated pseudo-labels perform nearly as well as those trained with actual ground-truth annotations. Furthermore, the SLF auto-labeler shows promising results in detailed shape predictions, providing a potential alternative for the occupancy annotation of dynamic objects.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to address the automation of 3D object annotation in the field of autonomous driving. Specifically, the authors propose a method to automatically generate 3D object shape labels from 2D cues (points or boxes). Unlike existing methods, this approach not only predicts 3D bounding boxes but also detailed 3D shapes, and it does not require training on specific datasets. This gives the method better generalization capabilities, allowing it to perform well across different datasets. ### Background and Challenges 1. **Need for 3D Annotation**: Modern robotics and autonomous driving systems require a large amount of annotated data to understand 3D scenes, especially the annotation of dynamic objects such as vehicles and pedestrians. 2. **Difficulty of Manual Annotation**: Manually annotating a large number of 3D bounding boxes is a tedious and costly task, limiting the scalability of 3D object detectors. 3. **Need for Fine-Grained Annotation**: With the development of 3D perception models, there is an increasing demand for finer annotation granularity, such as voxel occupancy, but these fine-grained annotations further complicate the annotation process, reducing efficiency. ### Proposed Method The authors propose a new method called Segment, Lift, and Fit (SLF) for automatic 3D annotation. The specific steps are as follows: 1. **Segment**: Using the input 2D cues (points or boxes), generate high-quality instance masks through the Segment Anything Model (SAM). 2. **Lift**: Lift the 2D instance masks to 3D form, representing the 3D objects using Signed Distance Function (SDF). 3. **Fit**: Iteratively optimize the shape and pose of the 3D objects through gradient descent until their projection aligns with the 2D masks and surrounding LiDAR points. ### Main Contributions 1. **Detailed Shape Prediction**: SLF not only predicts 3D bounding boxes but also detailed 3D shapes, improving annotation accuracy. 2. **No Training Required**: SLF does not rely on supervised training on specific datasets, avoiding overfitting issues and providing better generalization capabilities. 3. **Efficient Annotation**: Experimental results show that SLF generates high-quality 3D labels on the KITTI dataset, with AP@0.5 IoU close to 90%, and the performance of detectors trained with the generated pseudo-labels is close to that of detectors trained with real labels. ### Experimental Results 1. **Comparison with Unsupervised Auto-Labelers**: SLF outperforms other unsupervised auto-labelers on the KITTI validation set, especially on moderate and hard samples. 2. **Cross-Dataset Generalization**: On the more challenging nuScenes dataset, SLF outperforms supervised auto-labelers like MTrans, particularly in mAP and NDS metrics. 3. **Detector Performance**: Detectors trained with pseudo-labels generated by SLF outperform those trained with pseudo-labels generated by FGR across multiple metrics. ### Conclusion This paper proposes a method called SLF to automatically generate 3D object shape labels from 2D cues, addressing the automation of 3D annotation in the field of autonomous driving. SLF not only predicts detailed 3D shapes but also has good generalization capabilities and efficient annotation performance.

Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts

Autolabeling 3D Objects With Differentiable Rendering of SDF Shape Priors

3DSFLabelling: Boosting 3D Scene Flow Estimation by Pseudo Auto-labelling

Manual-Label Free 3D Detection via An Open-Source Simulator

ALPI: Auto-Labeller with Proxy Injection for 3D Object Detection using 2D Labels Only

You Only Label Once: 3D Box Adaptation from Point Cloud to Image with Semi-Supervised Learning

3D Point Cloud Labeling Tool for Driving Automatically

LABELMAKER: Automatic Semantic Label Generation from RGB-D Trajectories

LDLS: 3-D Object Segmentation Through Label Diffusion From 2-D Images

Better Call SAL: Towards Learning to Segment Anything in Lidar

AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection

AutoInst: Automatic Instance-Based Segmentation of LiDAR 3D Scans

Detecting As Labeling: Rethinking LiDAR-camera Fusion in 3D Object Detection

Vision-Language Guidance for LiDAR-based Unsupervised 3D Object Detection

Self-Supervised Drivable Area Segmentation Using LiDAR's Depth Information for Autonomous Driving

LabelFormer: Object Trajectory Refinement for Offboard Perception from LiDAR Point Clouds

labelCloud: A Lightweight Domain-Independent Labeling Tool for 3D Object Detection in Point Clouds

Automatic Labeling to Generate Training Data for Online LiDAR-Based Moving Object Segmentation

PA3DNet: 3-D Vehicle Detection with Pseudo Shape Segmentation and Adaptive Camera-LiDAR Fusion

Domain generalization of 3D semantic segmentation in autonomous driving

Lidar Point Cloud Guided Monocular 3D Object Detection