Abstract:Large-scale 2D datasets have been instrumental in advancing machine learning; however, progress in 3D vision tasks has been relatively slow. This disparity is largely due to the limited availability of 3D benchmarking datasets. In particular, creating real-world point cloud datasets for indoor scene semantic segmentation presents considerable challenges, including data collection within confined spaces and the costly, often inaccurate process of per-point labeling to generate ground truths. While synthetic datasets address some of these challenges, they often fail to replicate real-world conditions, particularly the occlusions that occur in point clouds collected from real environments. Existing 3D benchmarking datasets typically evaluate deep learning models under the assumption that training and test data are independently and identically distributed (IID), which affects the models' usability for real-world point cloud segmentation. To address these challenges, we introduce the BelHouse3D dataset, a new synthetic point cloud dataset designed for 3D indoor scene semantic segmentation. This dataset is constructed using real-world references from 32 houses in Belgium, ensuring that the synthetic data closely aligns with real-world conditions. Additionally, we include a test set with data occlusion to simulate out-of-distribution (OOD) scenarios, reflecting the occlusions commonly encountered in real-world point clouds. We evaluate popular point-based semantic segmentation methods using our OOD setting and present a benchmark. We believe that BelHouse3D and its OOD setting will advance research in 3D point cloud semantic segmentation for indoor scenes, providing valuable insights for the development of more generalizable models.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the lack of a benchmark dataset for occlusion robustness in current 3D point cloud semantic segmentation tasks, especially for 3D point clouds of indoor scenes. Specifically: 1. **Limitations of existing datasets**: - **Challenges of real - world datasets**: It is very difficult and costly to collect and label 3D point cloud data in the real world, especially in indoor environments, because data collection needs to be carried out in a limited space, and point - by - point labeling is required to generate the ground truth. - **Deficiencies of synthetic datasets**: Although synthetic datasets can solve some problems, they often cannot fully simulate real - world conditions, especially the common occlusion phenomenon in point clouds. 2. **Limitations of the independent and identically distributed (IID) assumption**: - Existing 3D benchmark datasets usually assume that the training data and the test data are independently and identically distributed (IID) sampled from the same distribution. However, in practical applications, due to factors such as occlusion, the test data are often out - of - distribution (OOD), which may lead to poor performance of the model in the real world. 3. **Lack of a dedicated OOD evaluation dataset**: - Although OOD generalization has been studied in other fields, in the 3D point cloud semantic segmentation task, especially for datasets of indoor scenes, there is a lack of a dedicated OOD benchmark dataset to evaluate the generalization ability of the model. To solve these problems, the author introduced the BelHouse3D dataset, which is a new synthetic 3D point cloud dataset specifically designed for semantic segmentation of indoor scenes. The main features of this dataset include: - **Based on real - world references**: Using real - world references from 32 houses in Belgium to ensure that the synthetic data is highly consistent with the actual situation. - **Test set with occlusion**: Provide a test set with occlusion to simulate out - of - distribution (OOD) scenes in the real world and evaluate the performance of the model when facing occlusion. - **Comprehensive benchmark evaluation**: Conducted a comprehensive benchmark evaluation of popular point cloud semantic segmentation methods, showing the performance changes of different models under OOD conditions. Through these improvements, the BelHouse3D dataset aims to promote research in the field of 3D point cloud semantic segmentation, especially to improve the robustness and generalization ability of the model when facing real - world challenges such as occlusion. ### Formula examples When discussing the impact of occlusion on model performance, the paper mentioned some changes in performance metrics. For example, the changes in the intersection - over - union (mIOU) and overall accuracy (OA) can be expressed by formulas: \[ \Delta \text{mIOU} = \text{mIOU}_{\text{OOD}} - \text{mIOU}_{\text{IID}} \] \[ \Delta \text{OA} = \text{OA}_{\text{OOD}} - \text{OA}_{\text{IID}} \] where \(\text{mIOU}_{\text{IID}}\) and \(\text{OA}_{\text{IID}}\) respectively represent the intersection - over - union and overall accuracy on the independent and identically distributed (IID) test set, while \(\text{mIOU}_{\text{OOD}}\) and \(\text{OA}_{\text{OOD}}\) represent the corresponding metrics on the out - of - distribution (OOD) test set.

BelHouse3D: A Benchmark Dataset for Assessing Occlusion Robustness in 3D Point Cloud Semantic Segmentation

A 3D INDOOR-OUTDOOR BENCHMARK DATASET FOR LoD3 BUILDING POINT CLOUD SEMANTIC SEGMENTATION

Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges

CUS3D: A New Comprehensive Urban-Scale Semantic-Segmentation 3D Benchmark Dataset

An Aerial Photogrammetry Benchmark Dataset for Point Cloud Segmentation and Style Translation

Building3D: An Urban-Scale Dataset and Benchmarks for Learning Roof Structures from Point Clouds

A neuromorphic dataset for tabletop object segmentation in indoor cluttered environment

A Benchmark Grocery Dataset of Realworld Point Clouds From Single View

Paris-CARLA-3D: A Real and Synthetic Outdoor Point Cloud Dataset for Challenging Tasks in 3D Mapping

HRHD-HK: A benchmark dataset of high-rise and high-density urban scenes for 3D semantic segmentation of photogrammetric point clouds

Semantic 3D reconstruction-oriented image dataset for building component segmentation

Building-PCC: Building Point Cloud Completion Benchmarks

ZAHA: Introducing the Level of Facade Generalization and the Large-Scale Point Cloud Facade Semantic Segmentation Benchmark Dataset

RESSCAL3D++: Joint Acquisition and Semantic Segmentation of 3D Point Clouds

WHU-Urban3D: An urban scene LiDAR point cloud dataset for semantic instance segmentation

InLUT3D: Challenging real indoor dataset for point cloud analysis

ModelNet-O: A Large-Scale Synthetic Dataset for Occlusion-Aware Point Cloud Classification

Deep Projective 3D Semantic Segmentation

SemanticPOSS: A Point Cloud Dataset with Large Quantity of Dynamic Instances.

Large-Scale Indoor Visual-Geometric Multimodal Dataset and Benchmark for Novel View Synthesis

PartNet: A Large-scale Benchmark for Fine-grained and Hierarchical Part-level 3D Object Understanding