Abstract:Surface prediction and completion have been widely studied in various applications. Recently, research in surface completion has evolved from small objects to complex large-scale scenes. As a result, researchers have begun increasing the volume of data and leveraging a greater variety of data modalities including rendered RGB images, descriptive texts, depth images, etc, to enhance algorithm performance. However, existing datasets suffer from a deficiency in the amounts of scene-level models along with the corresponding multi-modal information. Therefore, a method to scale the datasets and generate multi-modal information in them efficiently is essential. To bridge this research gap, we propose MASSTAR: a Multi-modal lArge-scale Scene dataset with a verSatile Toolchain for surfAce pRediction and completion. We develop a versatile and efficient toolchain for processing the raw 3D data from the environments. It screens out a set of fine-grained scene models and generates the corresponding multi-modal data. Utilizing the toolchain, we then generate an example dataset composed of over a thousand scene-level models with partial real-world data added. We compare MASSTAR with the existing datasets, which validates its superiority: the ability to efficiently extract high-quality models from complex scenarios to expand the dataset. Additionally, several representative surface completion algorithms are benchmarked on MASSTAR, which reveals that existing algorithms can hardly deal with scene-level completion. We will release the source code of our toolchain and the dataset. For more details, please see our project page at

What problem does this paper attempt to address?

The main problems that this paper attempts to solve are some key challenges in the current surface prediction and completion research, especially the problem of insufficient datasets for large - scale scenes. Specifically: 1. **Insufficient dataset scale and modality diversity**: Existing datasets usually contain models of small - scale objects, such as chairs, tables, etc., and lack models of large - scale scenes (such as buildings, forests, etc.). In addition, these datasets also lack in modality diversity, mainly concentrating on a single or a few data types, such as 3D mesh models, RGB images, etc. 2. **Lack of real - world data**: Most existing datasets are mainly composed of synthetic models and lack real - world multi - modal data, which leads to poor performance of algorithms in practical applications because of the domain gap from simulation to reality. 3. **Limited dataset expansion ability**: Existing datasets are usually of a fixed scale and lack an effective toolchain to efficiently expand the dataset, thus limiting the development of research. To solve these problems, the paper proposes MASSTAR (Multi - Modal and Large - Scale Scene Dataset with a Versatile Toolchain for Surface Prediction and Completion), a multi - modal dataset containing large - scale scenes and its efficient toolchain. The main contributions of MASSTAR include: 1. **Developed a multi - functional and efficient toolchain**: This toolchain can screen out high - quality 3D mesh models from the real - world or synthetic environments and generate corresponding multi - modal information, such as images, description texts, point clouds, etc. 2. **Created a multi - modal large - scale scene dataset**: This dataset contains more than 1,000 scene - level 3D mesh models, some of which are from real - world data. 3. **Conducted benchmark tests on representative surface completion algorithms**: The results show that existing surface completion algorithms perform poorly when dealing with scene - level tasks, which highlights the importance of MASSTAR in promoting relevant research. 4. **Open - sourced the toolchain and dataset**: The authors plan to release the source code of the toolchain and sample datasets for researchers to further utilize and improve. Through these contributions, MASSTAR aims to promote research in the field of surface prediction and completion, especially in dealing with large - scale and complex scenes.

MASSTAR: A Multi-Modal and Large-Scale Scene Dataset with a Versatile Toolchain for Surface Prediction and Completion

MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations

STAR: A First-Ever Dataset and A Large-Scale Benchmark for Scene Graph Generation in Large-Size Satellite Imagery

Enhanced Multi-Scale Feature Adaptive Fusion Sparse Convolutional Network for Large-Scale Scenes Semantic Segmentation

Zero-Shot Multi-Object Scene Completion

TO-Scene: A Large-scale Dataset for Understanding 3D Tabletop Scenes

ESC-Net: Alleviating Triple Sparsity on 3D LiDAR Point Clouds for Extreme Sparse Scene Completion

DP-MVS: Detail Preserving Multi-View Surface Reconstruction of Large-Scale Scenes

MegaScenes: Scene-Level View Synthesis at Scale

MSeg: A Composite Dataset for Multi-domain Semantic Segmentation

Model2Scene: Learning 3D Scene Representation via Contrastive Language-CAD Models Pre-training

A multi-modal garden dataset and hybrid 3D dense reconstruction framework based on panoramic stereo images for a trimming robot

SceneFactory: A Workflow-centric and Unified Framework for Incremental Scene Modeling

A construction method of a large-scale physical rendering 3D semantic segmentation dataset

SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous Driving

GAMUS: A Geometry-aware Multi-modal Semantic Segmentation Benchmark for Remote Sensing Data

Semantic Segmentation-assisted Scene Completion for LiDAR Point Clouds

X-MAS: Extremely Large-Scale Multi-Modal Sensor Dataset for Outdoor Surveillance in Real Environments

MSC-AD: A Multiscene Unsupervised Anomaly Detection Dataset for Small Defect Detection of Casting Surface

PointSSC: A Cooperative Vehicle-Infrastructure Point Cloud Benchmark for Semantic Scene Completion

MCD: Diverse Large-Scale Multi-Campus Dataset for Robot Perception