Abstract:Multi-view stereo (MVS) aims to reconstruct the dense 3D geometry of a scene by processing and relating images captured from different viewpoints. Despite impressive successes, most existing techniques simply supervise cost volumes or depth maps through conventional classification or regression methods, thereby inadequately exploring the depth representation’s full potential. Moreover, reconstructing areas with occlusions or weak textures continues to be a long-standing challenge within MVS. Another critical issue, frequently neglected, is the potential inaccuracy of ground truth depths, as evidenced in datasets like DTU. To address these problems, we introduce EA-MVSNet, an innovative error-aware MVS framework designed to enhance depth prediction. The key contributions of this work include three parts: (1) We present a novel error-aware depth representation that enhances depth prediction accuracy through error-aware learning, thereby improving reconstruction quality. (2) We develop a Deformable Feature Pyramid Network (DFPN), meticulously designed to augment reconstruction details in occluded and texture-deficient areas. (3) We introduce a cross-view consistency guidance module into the learning process, effectively mitigating the detrimental effects of ground truth depth inaccuracies and fostering faster convergence. Comprehensive experiments on the DTU dataset and Tanks and Temples dataset validate the superiority of our EA-MVSNet. Compared to the preceding UniMVSNet, EA-MVSNet achieves a notable 7.6% decrease in overall reconstruction error on the DTU dataset, and boosts the mean F-score by 3.0% and 4.1% in the intermediate and advanced groups of the Tanks and Temples dataset, respectively, surpassing most recent state-of-the-art methods.

M3VSNET: Unsupervised Multi-Metric Multi-View Stereo Network

NPF-MVSNet: Normal and Pyramid Feature Aided Unsupervised MVS Network.

Unsupervised multi-view stereo network based on multi-stage depth estimation

Multi-View Stereo with Learnable Cost Metric

NR-MVSNet: Learning Multi-View Stereo Based on Normal Consistency and Depth Refinement

Multi-View Stereo Representation Revist: Region-Aware MVSNet

HC-MVSNet: A Probability Sampling-Based Multi-View-stereo Network with Hybrid Cascade Structure for 3D Reconstruction

Multi-View Stereo Network Based on Attention Mechanism and Neural Volume Rendering

Visibility-Aware Point-Based Multi-View Stereo Network

Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for 3D Reconstruction

EPP-MVSNet: Epipolar-assembling based Depth Prediction for Multi-view Stereo

Attention-guided Multi-view Stereo Network for Depth Estimation

P-MVSNet: Learning Patch-Wise Matching Confidence Aggregation for Multi-View Stereo.

DRI-MVSNet: A Depth Residual Inference Network for Multi-View Stereo Images

N2MVSNet: Non-Local Neighbors Aware Multi-View Stereo Network

MTD-MVSNet: Multi-view Stereo Network with Multi-scale Transformer and Dual Attention

Self-supervised Multi-view Stereo Via Inter and Intra Network Pseudo Depth

Vis-MVSNet: Visibility-Aware Multi-view Stereo Network

DSC-MVSNet: attention aware cost volume regularization based on depthwise separable convolution for multi-view stereo

MVSNet: Depth Inference for Unstructured Multi-view Stereo

EA-MVSNet: Learning Error-Awareness for Enhanced Multi-View Stereo