Abstract:Traditional MVS methods have good accuracy but struggle with completeness, while recently developed learning-based multi-view stereo (MVS) techniques have improved completeness except accuracy being compromised. We propose depth discontinuity learning for MVS methods, which further improves accuracy while retaining the completeness of the reconstruction. Our idea is to jointly estimate the depth and boundary maps where the boundary maps are explicitly used for further refinement of the depth maps. We validate our idea and demonstrate that our strategies can be easily integrated into the existing learning-based MVS pipeline where the reconstruction depends on high-quality depth map estimation. Extensive experiments on various datasets show that our method improves reconstruction quality compared to baseline. Experiments also demonstrate that the presented model and strategies have good generalization capabilities. The source code will be available soon.

What problem does this paper attempt to address?

### Problems Addressed by the Paper The paper aims to address the issue of depth discontinuity in multi-view stereo (MVS) networks. Specifically, traditional MVS methods, while achieving high accuracy in reconstruction, lack completeness. In contrast, recent learning-based methods have improved completeness but at the cost of accuracy. This paper proposes an enhanced module called Depth Discontinuity Learning (DDL), which improves the quality of depth maps by jointly estimating depth maps and boundary maps, thereby enhancing accuracy while maintaining completeness. ### Main Contributions 1. **Multi-task Learning Architecture**: A novel multi-task learning architecture is proposed for the joint estimation of depth maps and object boundary maps. 2. **Dual-modal Depth Representation**: A dual-modal depth representation method is introduced, representing the depth of each pixel as a distribution rather than a single depth value, to explicitly represent depth uncertainty. 3. **Depth Discontinuity-based Spatial Regularization Loss Function**: A general loss function formula is proposed to regularize depth maps through depth discontinuity, helping to learn depth discontinuity and optimize depth maps. ### Method Overview 1. **Feature Extraction**: A Feature Pyramid Network (FPN) is used to extract multi-scale features from color images. 2. **Coarse-to-fine PatchMatch Stereo (PMS)**: An initial depth map is generated in a coarse-to-fine manner. 3. **Depth Discontinuity Learning**: A 2D CNN-based U-Net architecture is used to estimate dual-modal depth density parameters for each pixel and generate geometric edge maps. 4. **Loss Functions**: The loss functions include depth-to-ground truth loss, edge-to-depth loss, smoothness loss, and dual-modal depth loss, which together optimize network performance. ### Experimental Results The authors conducted extensive experiments on multiple benchmark datasets, including DTU, ETH3D, "Tanks and Temples," and BlendedMVS. The experimental results show that the proposed method outperforms the baseline method Patchmatchnet in reconstruction quality and demonstrates strong generalization capabilities across different datasets. ### Conclusion The proposed DDL method significantly improves the reconstruction accuracy of multi-view stereo networks by jointly estimating depth maps and boundary maps while maintaining completeness. The experimental results on multiple datasets validate its effectiveness and generalization capability.

DDL-MVS: Depth Discontinuity Learning for MVS Networks

NR-MVSNet: Learning Multi-View Stereo Based on Normal Consistency and Depth Refinement

LNMVSNet: A Low-Noise Multi-View Stereo Depth Inference Method for 3D Reconstruction

HC-MVSNet: A Probability Sampling-Based Multi-View-stereo Network with Hybrid Cascade Structure for 3D Reconstruction

Multi-View Stereo Representation Revist: Region-Aware MVSNet

EA-MVSNet: Learning Error-Awareness for Enhanced Multi-View Stereo

Hybrid-MVS: Robust Multi-View Reconstruction with Hybrid Optimization of Visual and Depth Cues

SDL-MVS: View Space and Depth Deformable Learning Paradigm for Multi-View Stereo Reconstruction in Remote Sensing

Multi-View Stereo Network Based on Attention Mechanism and Neural Volume Rendering

EI-MVSNet: Epipolar-Guided Multi-View Stereo Network With Interval-Aware Label

MTD-MVSNet: Multi-view Stereo Network with Multi-scale Transformer and Dual Attention

RIAV-MVS: Recurrent-Indexing an Asymmetric Volume for Multi-View Stereo

DSC-MVSNet: attention aware cost volume regularization based on depthwise separable convolution for multi-view stereo

Adaptive Learning for Multi-view Stereo Reconstruction

GC-MVSNet: Multi-View, Multi-Scale, Geometrically-Consistent Multi-View Stereo

Vis-MVSNet: Visibility-Aware Multi-view Stereo Network

LoliMVS: An End-to-End Network for Multiview Stereo With Low-Light Images

RayMVSNet++: Learning Ray-based 1D Implicit Fields for Accurate Multi-View Stereo

Mono‐MVS: textureless‐aware multi‐view stereo assisted by monocular prediction

3DVNet: Multi-View Depth Prediction and Volumetric Refinement

BSI-MVS: multi-view stereo network with bidirectional semantic information