Abstract:Consumer-level RGB-D cameras have been widely used for dense 3D reconstruction of scenes. Especially for textureless or non-lambertian surfaces, consumer RGB-D cameras can ensure completeness of the reconstructed models at a low cost. However, the reconstruction quality relies heavily on the accuracy of the depth sensors. Digital cameras are also used popularly for capturing high-resolution pictures to achieve high-quality dense reconstruction of the scenes, but cannot handle textureless or non-lambertian regions well due to the visual ambiguity problem. To ensure both completeness and accuracy of the reconstructed 3D models, we propose a hybrid multi-view reconstruction pipeline named Hybrid-MVS, which combines the high-resolution images taken by a digital camera and the low-resolution RGB-D frames captured by a consumer RGB-D camera for robust reconstruction of complicated scenes with challenging textureless and non-lambertian surfaces. Unlike most existing multi-sensor systems which require explicit hardware calibration and synchronization of various sensors, the calibration and synchronization problems between the digital camera and RGB-D camera are implicitly solved for compositing reliable depth prior of the digital images in our pipeline. Especially, we propose a hybrid MVS framework for robust PatchMatch stereo and Delaunay meshing, which tightly couples both visual cues given by the digital images and depth cues from the RGB-D frames to maximize the complementary advantages. The experiments with quantitative and qualitative evaluations demonstrate the effectiveness of the proposed Hybrid-MVS framework, which can successfully achieve high-quality 3D reconstruction of complicated natural scenes with robustness to weakly textured and non-lambertian areas.

Cooperative Holistic 3D Scene Understanding from a Single RGB Image

Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation

Learning 3 D Scene Synthesis from Annotated RGB-D Images

Holistic++ Scene Understanding: Single-View 3D Holistic Scene Parsing and Human Pose Estimation With Human-Object Interaction and Physical Commonsense

Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image

Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image

Holistic 3D Scene Understanding from a Single Image with Implicit Representation.

Holistic 3 D Indoor Scene Parsing and Reconstruction from a Single RGB Image

DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization

Hybrid-MVS: Robust Multi-View Reconstruction with Hybrid Optimization of Visual and Depth Cues

Semantic Reconstruction based on RGB Image and Sparse Depth

Learning to Reconstruct and Understand Indoor Scenes from Sparse Views

Learning 3D Scene Priors with 2D Supervision

Indoor Scene Understanding with Geometric and Semantic Contexts

BUOL: A Bottom-Up Framework with Occupancy-aware Lifting for Panoptic 3D Scene Reconstruction From A Single Image

3D Room Layout Estimation from a Single RGB Image.

Bridging Stereo Geometry and BEV Representation with Reliable Mutual Interaction for Semantic Scene Completion

Automatic Semantic Modeling of Indoor Scenes from Low-Quality RGB-D Data Using Contextual Information

An Interactive Approach to Semantic Modeling of Indoor Scenes with an RGBD Camera

Real-time Indoor Scene Reconstruction with RGBD and Inertial Input.

Towards Panoptic 3D Parsing for Single Image in the Wild