Abstract:Consumer-level RGB-D cameras have been widely used for dense 3D reconstruction of scenes. Especially for textureless or non-lambertian surfaces, consumer RGB-D cameras can ensure completeness of the reconstructed models at a low cost. However, the reconstruction quality relies heavily on the accuracy of the depth sensors. Digital cameras are also used popularly for capturing high-resolution pictures to achieve high-quality dense reconstruction of the scenes, but cannot handle textureless or non-lambertian regions well due to the visual ambiguity problem. To ensure both completeness and accuracy of the reconstructed 3D models, we propose a hybrid multi-view reconstruction pipeline named Hybrid-MVS, which combines the high-resolution images taken by a digital camera and the low-resolution RGB-D frames captured by a consumer RGB-D camera for robust reconstruction of complicated scenes with challenging textureless and non-lambertian surfaces. Unlike most existing multi-sensor systems which require explicit hardware calibration and synchronization of various sensors, the calibration and synchronization problems between the digital camera and RGB-D camera are implicitly solved for compositing reliable depth prior of the digital images in our pipeline. Especially, we propose a hybrid MVS framework for robust PatchMatch stereo and Delaunay meshing, which tightly couples both visual cues given by the digital images and depth cues from the RGB-D frames to maximize the complementary advantages. The experiments with quantitative and qualitative evaluations demonstrate the effectiveness of the proposed Hybrid-MVS framework, which can successfully achieve high-quality 3D reconstruction of complicated natural scenes with robustness to weakly textured and non-lambertian areas.

Semantic Reconstruction based on RGB Image and Sparse Depth

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

Robustifying Semantic Cognition of Traversability Across Wearable RGB-depth Cameras

An RGB-D Fusion Based Semantic Segmentation Algorithm Based on Neighborhood Metric Relations

3D Scene Reconstruction with Sparse LiDAR Data and Monocular Image in Single Frame

Semantic Dense Reconstruction with Consistent Scene Segments

Hybrid-MVS: Robust Multi-View Reconstruction with Hybrid Optimization of Visual and Depth Cues

Multi-resolution Cascaded Network with Depth-similar Residual Module for Real-time Semantic Segmentation on RGB-D Images.

Online Scene Semantic Understanding Based on Sparsely Correlated Network for AR

Learning to Reconstruct and Understand Indoor Scenes from Sparse Views

Visual Odometry Based 3D-Reconstruction

Spatial-information Guided Adaptive Context-aware Network for Efficient RGB-D Semantic Segmentation

SSR-2D: Semantic 3D Scene Reconstruction from 2D Images

Semantic 3D Reconstruction with Learning MVS and 2D Segmentation of Aerial Images

TCANet: three-stream coordinate attention network for RGB-D indoor semantic segmentation

Interactive Efficient Multi-Task Network for RGB-D Semantic Segmentation

The effect of vicarious reinforcement on imitation: a review of the literature.

AsymFormer: Asymmetrical Cross-Modal Representation Learning for Mobile Platform Real-Time RGB-D Semantic Segmentation

Semantic RGB-D Image Synthesis

Exploiting Depth from Single Monocular Images for Object Detection and Semantic Segmentation

Single Image Based Three-Dimensional Scene Reconstruction Using Semantic and Geometric Priors