Abstract:Our novel multi‐view stereo network, assisted by monocular prediction, addresses limitations in weak and repetitive texture regions. Combining monocular and multi‐view branches, we leverage semantic information from the single image and geometric relationships among multiple images. A coarse‐to‐fine strategy optimises computational efficiency while maintaining effectiveness. Our method achieves outstanding results, particularly in textureless regions, as demonstrated through experiments on DTU, Tanks and Temples, and BlendedMVS datasets. The learning‐based multi‐view stereo (MVS) methods have made remarkable progress in recent years. However, these methods exhibit limited robustness when faced with occlusion, weak or repetitive texture regions in the image. These factors often lead to holes in the final point cloud model due to excessive pixel‐matching errors. To address these challenges, we propose a novel MVS network assisted by monocular prediction for 3D reconstruction. Our approach combines the strengths of both monocular and multi‐view branches, leveraging the internal semantic information extracted from a single image through monocular prediction, along with the strict geometric relationships between multiple images. Moreover, we adopt a coarse‐to‐fine strategy to gradually reduce the number of assumed depth planes and minimise the interval between them as the resolution of the input images increases during the network iteration. This strategy can achieve a balance between the computational resource consumption and the effectiveness of the model. Experiments on the DTU, Tanks and Temples, and BlendedMVS datasets demonstrate that our method achieves outstanding results, particularly in textureless regions.

Self-supervised Multi-view Stereo Via Inter and Intra Network Pseudo Depth

Unsupervised multi-view stereo network based on multi-stage depth estimation

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

Multi-View Stereo Representation Revist: Region-Aware MVSNet

Real-Time Unsupervised Multi-View Depth Estimation Network for Virtual View Synthesis

Semi-supervised Deep Multi-view Stereo

A contrastive learning based unsupervised multi-view stereo with multi-stage self-training strategy

CL-MVSNet: Unsupervised Multi-View Stereo with Dual-Level Contrastive Learning

Digging into Uncertainty in Self-supervised Multi-view Stereo

Mono‐MVS: textureless‐aware multi‐view stereo assisted by monocular prediction

Learning Unsupervised Multi-View Stereopsis via Robust Photometric Consistency

HC-MVSNet: A Probability Sampling-Based Multi-View-stereo Network with Hybrid Cascade Structure for 3D Reconstruction

EI-MVSNet: Epipolar-Guided Multi-View Stereo Network With Interval-Aware Label

GeoMVSNet: Learning Multi-View Stereo with Geometry Perception

N2MVSNet: Non-Local Neighbors Aware Multi-View Stereo Network

Multi-View Stereo Network Based on Attention Mechanism and Neural Volume Rendering

MVSNet: Depth Inference for Unstructured Multi-view Stereo

Multi-View Stereo Representation Revisit: Region-Aware MVSNet

Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for 3D Reconstruction

Multi-View Stereo Network with attention thin volume

Stereo Matching by Self-supervision of Multiscopic Vision.