LABELMAKER: Automatic Semantic Label Generation from RGB-D Trajectories

Silvan Weder,Hermann Blum,Francis Engelmann,Marc Pollefeys
2023-11-21
Abstract:Semantic annotations are indispensable to train or evaluate perception models, yet very costly to acquire. This work introduces a fully automated 2D/3D labeling framework that, without any human intervention, can generate labels for RGB-D scans at equal (or better) level of accuracy than comparable manually annotated datasets such as ScanNet. Our approach is based on an ensemble of state-of-the-art segmentation models and 3D lifting through neural rendering. We demonstrate the effectiveness of our LabelMaker pipeline by generating significantly better labels for the ScanNet datasets and automatically labelling the previously unlabeled ARKitScenes dataset. Code and models are available at <a class="link-external link-https" href="https://labelmaker.org" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address the issue of label generation in 3D semantic segmentation datasets, specifically how to automatically generate high-quality 2D and 3D labels without human intervention. Specifically, the paper proposes a fully automatic 2D/3D labeling framework called LabelMaker, which can generate labels comparable to or better than manually labeled datasets (such as ScanNet) without relying on manual annotation. ### Background and Motivation 1. **Importance of Semantic Awareness**: - Semantic awareness is central in computer vision and robotics, crucial for meaningful interaction with the environment. - In recent years, most solutions have focused on using deep neural networks, but training and evaluating these networks is very challenging and requires a large amount of labeled data. 2. **Cost of Labeled Data**: - Obtaining high-quality labeled data is usually very expensive because semantic labeling is a time-consuming manual process. - Especially in the field of 3D semantic segmentation, the existing data scale is far below that of 2D semantic segmentation datasets, and the labeling quality of many datasets is not high. 3. **Limitations of Existing Datasets**: - Although datasets like ScanNet are large in scale, there are issues with labeling quality and consistency. - New datasets like ARKitScenes provide a large number of RGB-D trajectories but lack dense semantic labels. ### Method Overview 1. **Base Models**: - Use multiple state-of-the-art 2D and 3D segmentation models (such as InternImage, OVSeg, CMX, Mask3D) to generate different label hypotheses. - These models are trained on different datasets with different category definitions and prediction spaces. 2. **Consensus Voting**: - Map the predictions of different models to a unified label space and generate the final 2D label for each frame through a consensus voting mechanism. 3. **3D Enhancement**: - Use Neural Radiance Fields (NeRF) to lift 2D labels to 3D space, further improving label quality through multi-view consistency and denoising. - The final 3D labels can be reprojected back to 2D to obtain multi-view consistent labels for the entire trajectory. ### Experiments and Results 1. **Datasets**: - Experiments were conducted on three datasets: ScanNet, Replica, and ARKitScenes to verify the effectiveness of the method. - The label quality was evaluated by comparing it with high-precision ground truth manually labeled. 2. **Performance Metrics**: - Evaluated using metrics such as mean Intersection over Union (mIoU), mean Accuracy (mAcc), and total Accuracy (tAcc). - Results show that the labels generated by LabelMaker outperform existing manual labels and baseline methods on multiple metrics. 3. **Ablation Study**: - Ablation experiments verified the contribution of consensus voting and 3D enhancement to performance. - Results indicate that consensus voting and 3D enhancement significantly improve label quality. ### Main Contributions 1. **Label Mapping**: - Provides fine mappings between indoor label sets NYU40, ADE20k, ScanNet, Replica, and WordNet. 2. **Automatic Labeling Pipeline**: - Proposes a fully automatic pipeline capable of generating high-quality 2D and 3D labels, with label quality exceeding or equaling existing manually labeled datasets. 3. **Labeling New Datasets**: - Successfully generated high-quality dense labels for unlabeled datasets such as ARKitScenes. ### Conclusion Through the LabelMaker framework, this paper addresses the issue of label generation in 3D semantic segmentation datasets, achieving high-quality automatic generation of 2D and 3D labels, providing a new solution for labeling large-scale datasets.