Abstract:Consumer-level RGB-D cameras have been widely used for dense 3D reconstruction of scenes. Especially for textureless or non-lambertian surfaces, consumer RGB-D cameras can ensure completeness of the reconstructed models at a low cost. However, the reconstruction quality relies heavily on the accuracy of the depth sensors. Digital cameras are also used popularly for capturing high-resolution pictures to achieve high-quality dense reconstruction of the scenes, but cannot handle textureless or non-lambertian regions well due to the visual ambiguity problem. To ensure both completeness and accuracy of the reconstructed 3D models, we propose a hybrid multi-view reconstruction pipeline named Hybrid-MVS, which combines the high-resolution images taken by a digital camera and the low-resolution RGB-D frames captured by a consumer RGB-D camera for robust reconstruction of complicated scenes with challenging textureless and non-lambertian surfaces. Unlike most existing multi-sensor systems which require explicit hardware calibration and synchronization of various sensors, the calibration and synchronization problems between the digital camera and RGB-D camera are implicitly solved for compositing reliable depth prior of the digital images in our pipeline. Especially, we propose a hybrid MVS framework for robust PatchMatch stereo and Delaunay meshing, which tightly couples both visual cues given by the digital images and depth cues from the RGB-D frames to maximize the complementary advantages. The experiments with quantitative and qualitative evaluations demonstrate the effectiveness of the proposed Hybrid-MVS framework, which can successfully achieve high-quality 3D reconstruction of complicated natural scenes with robustness to weakly textured and non-lambertian areas.

DiFT: Differentiable Differential Feature Transform for Multi-View Stereo

Learning Efficient Photometric Feature Transform for Multi-view Stereo

Hybrid-MVS: Robust Multi-View Reconstruction with Hybrid Optimization of Visual and Depth Cues

High-Quality Depth Recovery Via Interactive Multi-view Stereo

A Differential Volumetric Approach to Multi-View Photometric Stereo

High-Quality Depth Recovery via Interactive MultiView Stereo Supplementary Document

Learning Photometric Feature Transform for Free-form Object Scan

A Light Multi-View Stereo Method with Patch-Uncertainty Awareness

A Learning-based Framework for Hybrid Depth-from-Defocus and Stereo Matching

Stereoscopic video conversion based on depth tracking

MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds

Multi-Scale Geometric Consistency Guided Multi-View Stereo

A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding

Deep 3D Capture: Geometry and Reflectance from Sparse Multi-View Images

A Multi-View Fusion Method Via Tensor Learning And Gradient Descent For Image Features

CHOSEN: Contrastive Hypothesis Selection for Multi-View Depth Refinement

DTR-Map: A Digital Twin-Enabled Real-Time Mapping System Based on Multi-View Stereo

Multi-view Stereo Via Depth Map Fusion: A Coordinate Decent Optimization Method

High-Quality RGB-D Reconstruction via Multi-View Uncalibrated Photometric Stereo and Gradient-SDF

DiVE: DiT-based Video Generation with Enhanced Control

Robust active stereo vision using Kullback-Leibler divergence