Abstract:Consumer-level RGB-D cameras have been widely used for dense 3D reconstruction of scenes. Especially for textureless or non-lambertian surfaces, consumer RGB-D cameras can ensure completeness of the reconstructed models at a low cost. However, the reconstruction quality relies heavily on the accuracy of the depth sensors. Digital cameras are also used popularly for capturing high-resolution pictures to achieve high-quality dense reconstruction of the scenes, but cannot handle textureless or non-lambertian regions well due to the visual ambiguity problem. To ensure both completeness and accuracy of the reconstructed 3D models, we propose a hybrid multi-view reconstruction pipeline named Hybrid-MVS, which combines the high-resolution images taken by a digital camera and the low-resolution RGB-D frames captured by a consumer RGB-D camera for robust reconstruction of complicated scenes with challenging textureless and non-lambertian surfaces. Unlike most existing multi-sensor systems which require explicit hardware calibration and synchronization of various sensors, the calibration and synchronization problems between the digital camera and RGB-D camera are implicitly solved for compositing reliable depth prior of the digital images in our pipeline. Especially, we propose a hybrid MVS framework for robust PatchMatch stereo and Delaunay meshing, which tightly couples both visual cues given by the digital images and depth cues from the RGB-D frames to maximize the complementary advantages. The experiments with quantitative and qualitative evaluations demonstrate the effectiveness of the proposed Hybrid-MVS framework, which can successfully achieve high-quality 3D reconstruction of complicated natural scenes with robustness to weakly textured and non-lambertian areas.

Sparse multi-view hand-object reconstruction for unseen environments

In-Hand 3D Object Reconstruction from a Monocular RGB Video

CAMInterHand: Cooperative Attention for Multi-View Interactive Hand Pose and Mesh Reconstruction

Learning Hand-Held Object Reconstruction from In-The-Wild Videos

Personalized Hand Modeling from Multiple Postures with Multi‐View Color Images

Stereo Hand-Object Reconstruction for Human-to-Robot Handover

3D Reconstruction of Objects in Hands without Real World 3D Supervision

Contact-conditioned Hand-Held Object Reconstruction from Single-View Images

MOHO: Learning Single-view Hand-held Object Reconstruction with Multi-view Occlusion-Aware Supervision

Reconstructing Hand-Held Objects from Monocular Video.

Novel-view Synthesis and Pose Estimation for Hand-Object Interaction from Sparse Views

HandO: a Hybrid 3D Hand–object Reconstruction Model for Unknown Objects

HandFormer: Hand Pose Reconstructing from a Single RGB Image

EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild

RealisticHands: A Hybrid Model for 3D Hand Reconstruction

Joint Hand-Object 3D Reconstruction From a Single Image With Cross-Branch Feature Fusion

Resolving hand‐object occlusion for mixed reality with joint deep learning and model optimization

Model-based 3D Hand Reconstruction via Self-Supervised Learning

Single Depth View Based Real-Time Reconstruction of Hand-Object Interactions

Reconstructing Hand-Held Objects in 3D from Images and Videos

Hybrid-MVS: Robust Multi-View Reconstruction with Hybrid Optimization of Visual and Depth Cues