Abstract:This paper proposes a new method for simultaneous 3D reconstruction and semantic segmentation of indoor scenes. Unlike existing methods that require recording a video using a color camera and/or a depth camera, our method only needs a small number of (e.g., 3-5) color images from uncalibrated sparse views as input, which greatly simplifies data acquisition and extends applicable scenarios. Since different views have limited overlaps, our method allows a single image as input to discern the depth and semantic information of the scene. The key issue is how to recover relatively accurate depth from single images and reconstruct a 3D scene by fusing very few depth maps. To address this problem, we first design an iterative deep architecture, IterNet, that estimates depth and semantic segmentation alternately, so that they benefit each other. To deal with the little overlap and non-rigid transformation between views, we further propose a joint global and local registration method to reconstruct a 3D scene with semantic information from sparse views. We also make available a new indoor synthetic dataset simultaneously providing photorealistic high-resolution RGB images, accurate depth maps and pixel-level semantic labels for thousands of complex layouts, useful for training and evaluation. Experimental results on public datasets and our dataset demonstrate that our method achieves more accurate depth estimation, smaller semantic segmentation errors and better 3D reconstruction results, compared with state-of-the-art methods.

3D Model Retrieval and Pose Estimation for Indoor Images by Simulating Scene Context

Data-Driven Indoor Scene Modeling from a Single Color Image with Iterative Object Segmentation and Model Retrieval

Learning 3 D Scene Synthesis from Annotated RGB-D Images

Model-driven Indoor Scenes Modeling from a Single Image.

Singe Image-Based Data-Driven Indoor Scenes Modeling

Automatic Semantic Modeling of Indoor Scenes from Low-Quality RGB-D Data Using Contextual Information

Indoor Scene Modeling from a Single Image Using Normal Inference and Edge Features

Automatic 3D Indoor Scene Modeling from Single Panorama

Indoor Scene Generation from a Collection of Semantic-Segmented Depth Images

An Interactive Approach to Semantic Modeling of Indoor Scenes with an RGBD Camera

Learning to Reconstruct and Understand Indoor Scenes from Sparse Views

Indoor camera pose estimation via style‐transfer 3D models

Fast Interactive Scene Modeling from an Image Sequence

3D Reconstruction of Indoor Scenes Via Image Registration

Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image

Indoor Scene Reconstruction From Monocular Video Combining Contextual and Geometric Priors

Hybrid 3D Reconstruction of Indoor Scenes Integrating Object Recognition

Three-Dimensional Structure Measurement And Optimization Method Of Indoor Scene Based On Single Image

3D indoor scene modeling from RGB-D data: a survey

3D Reconstruction of Indoor Scenes Based on Feature and Graph Optimization

Understanding of Indoor Scenes Based on Projection of Spatial Rectangles.