Abstract:Semantic annotations are indispensable to train or evaluate perception models, yet very costly to acquire. This work introduces a fully automated 2D/3D labeling framework that, without any human intervention, can generate labels for RGB-D scans at equal (or better) level of accuracy than comparable manually annotated datasets such as ScanNet. Our approach is based on an ensemble of state-of-the-art segmentation models and 3D lifting through neural rendering. We demonstrate the effectiveness of our LabelMaker pipeline by generating significantly better labels for the ScanNet datasets and automatically labelling the previously unlabeled ARKitScenes dataset. Code and models are available at <a class="link-external link-https" href="https://labelmaker.org" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to address the issue of label generation in 3D semantic segmentation datasets, specifically how to automatically generate high-quality 2D and 3D labels without human intervention. Specifically, the paper proposes a fully automatic 2D/3D labeling framework called LabelMaker, which can generate labels comparable to or better than manually labeled datasets (such as ScanNet) without relying on manual annotation. ### Background and Motivation 1. **Importance of Semantic Awareness**: - Semantic awareness is central in computer vision and robotics, crucial for meaningful interaction with the environment. - In recent years, most solutions have focused on using deep neural networks, but training and evaluating these networks is very challenging and requires a large amount of labeled data. 2. **Cost of Labeled Data**: - Obtaining high-quality labeled data is usually very expensive because semantic labeling is a time-consuming manual process. - Especially in the field of 3D semantic segmentation, the existing data scale is far below that of 2D semantic segmentation datasets, and the labeling quality of many datasets is not high. 3. **Limitations of Existing Datasets**: - Although datasets like ScanNet are large in scale, there are issues with labeling quality and consistency. - New datasets like ARKitScenes provide a large number of RGB-D trajectories but lack dense semantic labels. ### Method Overview 1. **Base Models**: - Use multiple state-of-the-art 2D and 3D segmentation models (such as InternImage, OVSeg, CMX, Mask3D) to generate different label hypotheses. - These models are trained on different datasets with different category definitions and prediction spaces. 2. **Consensus Voting**: - Map the predictions of different models to a unified label space and generate the final 2D label for each frame through a consensus voting mechanism. 3. **3D Enhancement**: - Use Neural Radiance Fields (NeRF) to lift 2D labels to 3D space, further improving label quality through multi-view consistency and denoising. - The final 3D labels can be reprojected back to 2D to obtain multi-view consistent labels for the entire trajectory. ### Experiments and Results 1. **Datasets**: - Experiments were conducted on three datasets: ScanNet, Replica, and ARKitScenes to verify the effectiveness of the method. - The label quality was evaluated by comparing it with high-precision ground truth manually labeled. 2. **Performance Metrics**: - Evaluated using metrics such as mean Intersection over Union (mIoU), mean Accuracy (mAcc), and total Accuracy (tAcc). - Results show that the labels generated by LabelMaker outperform existing manual labels and baseline methods on multiple metrics. 3. **Ablation Study**: - Ablation experiments verified the contribution of consensus voting and 3D enhancement to performance. - Results indicate that consensus voting and 3D enhancement significantly improve label quality. ### Main Contributions 1. **Label Mapping**: - Provides fine mappings between indoor label sets NYU40, ADE20k, ScanNet, Replica, and WordNet. 2. **Automatic Labeling Pipeline**: - Proposes a fully automatic pipeline capable of generating high-quality 2D and 3D labels, with label quality exceeding or equaling existing manually labeled datasets. 3. **Labeling New Datasets**: - Successfully generated high-quality dense labels for unlabeled datasets such as ARKitScenes. ### Conclusion Through the LabelMaker framework, this paper addresses the issue of label generation in 3D semantic segmentation datasets, achieving high-quality automatic generation of 2D and 3D labels, providing a new solution for labeling large-scale datasets.

LABELMAKER: Automatic Semantic Label Generation from RGB-D Trajectories

ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding

Semi-Automatic Labeling for Deep Learning in Robotics

LabelFormer: Object Trajectory Refinement for Offboard Perception from LiDAR Point Clouds

Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts

Labeling Indoor Scenes with Fusion of Out-of-the-Box Perception Models

Building a Fully-Automatized Active Learning Framework for the Semantic Segmentation of Geospatial 3D Point Clouds

NeuralLabeling: A versatile toolset for labeling vision datasets using Neural Radiance Fields

From CAD models to soft point cloud labels: An automatic annotation pipeline for cheaply supervised 3D semantic segmentation

Labeling 3D scenes for Personal Assistant Robots

Automated Multimodal Data Annotation via Calibration With Indoor Positioning System

LabelFusion: A Pipeline for Generating Ground Truth Labels for Real RGBD Data of Cluttered Scenes

AutoInst: Automatic Instance-Based Segmentation of LiDAR 3D Scans

When the pen is mightier than the sword: semi-automatic 2 and 3D image labelling

Annotator: A Generic Active Learning Baseline for LiDAR Semantic Segmentation

labelCloud: A Lightweight Domain-Independent Labeling Tool for 3D Object Detection in Point Clouds

All You Need is LUV: Unsupervised Collection of Labeled Images using Invisible UV Fluorescent Indicators

SSR-2D: Semantic 3D Scene Reconstruction from 2D Images

Semi-Automatic Annotation of 3D Radar and Camera for Smart Infrastructure-Based Perception

Smartannotator an Interactive Tool for Annotating Indoor Rgbd Images

SLRNet: Semi-Supervised Semantic Segmentation Via Label Reuse for Human Decomposition Images