Abstract:Consumer-level RGB-D cameras have been widely used for dense 3D reconstruction of scenes. Especially for textureless or non-lambertian surfaces, consumer RGB-D cameras can ensure completeness of the reconstructed models at a low cost. However, the reconstruction quality relies heavily on the accuracy of the depth sensors. Digital cameras are also used popularly for capturing high-resolution pictures to achieve high-quality dense reconstruction of the scenes, but cannot handle textureless or non-lambertian regions well due to the visual ambiguity problem. To ensure both completeness and accuracy of the reconstructed 3D models, we propose a hybrid multi-view reconstruction pipeline named Hybrid-MVS, which combines the high-resolution images taken by a digital camera and the low-resolution RGB-D frames captured by a consumer RGB-D camera for robust reconstruction of complicated scenes with challenging textureless and non-lambertian surfaces. Unlike most existing multi-sensor systems which require explicit hardware calibration and synchronization of various sensors, the calibration and synchronization problems between the digital camera and RGB-D camera are implicitly solved for compositing reliable depth prior of the digital images in our pipeline. Especially, we propose a hybrid MVS framework for robust PatchMatch stereo and Delaunay meshing, which tightly couples both visual cues given by the digital images and depth cues from the RGB-D frames to maximize the complementary advantages. The experiments with quantitative and qualitative evaluations demonstrate the effectiveness of the proposed Hybrid-MVS framework, which can successfully achieve high-quality 3D reconstruction of complicated natural scenes with robustness to weakly textured and non-lambertian areas.

Volumetric 3D Reconstruction with Window-Wise Global Feature Aggregation

Recurrent Volume-Based 3-D Feature Fusion for Real-Time Multiview Object Pose Estimation.

Recurrent Volume-based 3D Feature Fusion for Real-time Multi-view Object Pose Estimation

3D Former: Monocular Scene Reconstruction with 3D SDF Transformers

VisFusion: Visibility-aware Online 3D Scene Reconstruction from Videos

FA-MSVNet: multi-scale and multi-view feature aggregation methods for stereo 3D reconstruction

Incremental Dense Reconstruction from Monocular Video with Guided Sparse Feature Volume Fusion

Indoor Scene Reconstruction From Monocular Video Combining Contextual and Geometric Priors

Hybrid-MVS: Robust Multi-View Reconstruction with Hybrid Optimization of Visual and Depth Cues

CVRecon: Rethinking 3D Geometric Feature Learning for Neural Reconstruction.

DP-MVS: Detail Preserving Multi-View Surface Reconstruction of Large-Scale Scenes

Multi-view depth estimation based on multi-feature aggregation for 3D reconstruction

360Recon: An Accurate Reconstruction Method Based on Depth Fusion from 360 Images

2L3: Lifting Imperfect Generated 2D Images into Accurate 3D

Volrecon: Volume rendering of signed ray distance functions for generalizable multi-view reconstruction

ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model

Multi-view 3D Reconstruction with Transformer

Multi-View 3D Reconstruction Based on FEWO-MVSNet

Single-view 3D Reconstruction Algorithm Based on View-aware

VoRTX: Volumetric 3D Reconstruction With Transformers for Voxelwise View Selection and Fusion

Georecon: a coarse-to-fine visual 3D reconstruction approach for high-resolution images with neural matching priors