Abstract:Consumer-level RGB-D cameras have been widely used for dense 3D reconstruction of scenes. Especially for textureless or non-lambertian surfaces, consumer RGB-D cameras can ensure completeness of the reconstructed models at a low cost. However, the reconstruction quality relies heavily on the accuracy of the depth sensors. Digital cameras are also used popularly for capturing high-resolution pictures to achieve high-quality dense reconstruction of the scenes, but cannot handle textureless or non-lambertian regions well due to the visual ambiguity problem. To ensure both completeness and accuracy of the reconstructed 3D models, we propose a hybrid multi-view reconstruction pipeline named Hybrid-MVS, which combines the high-resolution images taken by a digital camera and the low-resolution RGB-D frames captured by a consumer RGB-D camera for robust reconstruction of complicated scenes with challenging textureless and non-lambertian surfaces. Unlike most existing multi-sensor systems which require explicit hardware calibration and synchronization of various sensors, the calibration and synchronization problems between the digital camera and RGB-D camera are implicitly solved for compositing reliable depth prior of the digital images in our pipeline. Especially, we propose a hybrid MVS framework for robust PatchMatch stereo and Delaunay meshing, which tightly couples both visual cues given by the digital images and depth cues from the RGB-D frames to maximize the complementary advantages. The experiments with quantitative and qualitative evaluations demonstrate the effectiveness of the proposed Hybrid-MVS framework, which can successfully achieve high-quality 3D reconstruction of complicated natural scenes with robustness to weakly textured and non-lambertian areas.

Sub-pixel Convolution and Edge Detection for Multi-view Stereo

DSC-MVSNet: attention aware cost volume regularization based on depthwise separable convolution for multi-view stereo

Adaptive Cost Aggregation in Iterative Depth Estimation for Efficient Multi-view Stereo.

Multi-View Stereo Representation Revist: Region-Aware MVSNet

Cost Volume Pyramid Based Depth Inference for Multi-View Stereo

Context-Guided Multi-view Stereo with Depth Back-Projection

MTD-MVSNet: Multi-view Stereo Network with Multi-scale Transformer and Dual Attention

Recurrent Mvsnet For High-Resolution Multi-View Stereo Depth Inference

MFE‐MVSNet: Multi‐scale feature enhancement multi‐view stereo with bi‐directional connections

Transformer-guided Feature Pyramid Network for Multi-View Stereo

Attention-enhanced multi-source cost volume multi-view stereo

A Confidence-based Iterative Solver of Depths and Surface Normals for Deep Multi-view Stereo

Unsupervised multi-view stereo network based on multi-stage depth estimation

Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for 3D Reconstruction

HC-MVSNet: A Probability Sampling-Based Multi-View-stereo Network with Hybrid Cascade Structure for 3D Reconstruction

OD-MVSNet: Omni-dimensional dynamic multi-view stereo network

Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume

Hybrid-MVS: Robust Multi-View Reconstruction with Hybrid Optimization of Visual and Depth Cues

Effects of neonatal treatment with Tyr-MIF-1 and naloxone on the long-term body weight gain induced by repeated postnatal stressful stimuli

Multi-View Stereo Network with attention thin volume

BSI-MVS: multi-view stereo network with bidirectional semantic information