Abstract:Consumer-level RGB-D cameras have been widely used for dense 3D reconstruction of scenes. Especially for textureless or non-lambertian surfaces, consumer RGB-D cameras can ensure completeness of the reconstructed models at a low cost. However, the reconstruction quality relies heavily on the accuracy of the depth sensors. Digital cameras are also used popularly for capturing high-resolution pictures to achieve high-quality dense reconstruction of the scenes, but cannot handle textureless or non-lambertian regions well due to the visual ambiguity problem. To ensure both completeness and accuracy of the reconstructed 3D models, we propose a hybrid multi-view reconstruction pipeline named Hybrid-MVS, which combines the high-resolution images taken by a digital camera and the low-resolution RGB-D frames captured by a consumer RGB-D camera for robust reconstruction of complicated scenes with challenging textureless and non-lambertian surfaces. Unlike most existing multi-sensor systems which require explicit hardware calibration and synchronization of various sensors, the calibration and synchronization problems between the digital camera and RGB-D camera are implicitly solved for compositing reliable depth prior of the digital images in our pipeline. Especially, we propose a hybrid MVS framework for robust PatchMatch stereo and Delaunay meshing, which tightly couples both visual cues given by the digital images and depth cues from the RGB-D frames to maximize the complementary advantages. The experiments with quantitative and qualitative evaluations demonstrate the effectiveness of the proposed Hybrid-MVS framework, which can successfully achieve high-quality 3D reconstruction of complicated natural scenes with robustness to weakly textured and non-lambertian areas.

Adaptive region aggregation for multi‐view stereo matching using deformable convolutional networks

N2MVSNet: Non-Local Neighbors Aware Multi-View Stereo Network

MFE‐MVSNet: Multi‐scale feature enhancement multi‐view stereo with bi‐directional connections

CNLPA-MVS: Coarse-Hypotheses Guided Non-Local PatchMatch Multi-View Stereo

DSC-MVSNet: attention aware cost volume regularization based on depthwise separable convolution for multi-view stereo

AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network

Multi-view depth estimation based on multi-feature aggregation for 3D reconstruction

Multi-View Stereo Representation Revist: Region-Aware MVSNet

HC-MVSNet: A Probability Sampling-Based Multi-View-stereo Network with Hybrid Cascade Structure for 3D Reconstruction

Hybrid-MVS: Robust Multi-View Reconstruction with Hybrid Optimization of Visual and Depth Cues

Adaptive Cost Aggregation in Iterative Depth Estimation for Efficient Multi-view Stereo.

Adaptive deformable convolutional network

Multi-View Stereo Network Based on Attention Mechanism and Neural Volume Rendering

Self-adaptive Multi-scale Aggregation Network for Stereo Matching.

MTD-MVSNet: Multi-view Stereo Network with Multi-scale Transformer and Dual Attention

NR-MVSNet: Learning Multi-View Stereo Based on Normal Consistency and Depth Refinement

Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume

FA-MSVNet: multi-scale and multi-view feature aggregation methods for stereo 3D reconstruction

OD-MVSNet: Omni-dimensional dynamic multi-view stereo network

EI-MVSNet: Epipolar-Guided Multi-View Stereo Network With Interval-Aware Label

Mono‐MVS: textureless‐aware multi‐view stereo assisted by monocular prediction