DDL-MVS: Depth Discontinuity Learning for MVS Networks

Nail Ibrahimli,Hugo Ledoux,Julian Kooij,Liangliang Nan
2023-06-12
Abstract:Traditional MVS methods have good accuracy but struggle with completeness, while recently developed learning-based multi-view stereo (MVS) techniques have improved completeness except accuracy being compromised. We propose depth discontinuity learning for MVS methods, which further improves accuracy while retaining the completeness of the reconstruction. Our idea is to jointly estimate the depth and boundary maps where the boundary maps are explicitly used for further refinement of the depth maps. We validate our idea and demonstrate that our strategies can be easily integrated into the existing learning-based MVS pipeline where the reconstruction depends on high-quality depth map estimation. Extensive experiments on various datasets show that our method improves reconstruction quality compared to baseline. Experiments also demonstrate that the presented model and strategies have good generalization capabilities. The source code will be available soon.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper aims to address the issue of depth discontinuity in multi-view stereo (MVS) networks. Specifically, traditional MVS methods, while achieving high accuracy in reconstruction, lack completeness. In contrast, recent learning-based methods have improved completeness but at the cost of accuracy. This paper proposes an enhanced module called Depth Discontinuity Learning (DDL), which improves the quality of depth maps by jointly estimating depth maps and boundary maps, thereby enhancing accuracy while maintaining completeness. ### Main Contributions 1. **Multi-task Learning Architecture**: A novel multi-task learning architecture is proposed for the joint estimation of depth maps and object boundary maps. 2. **Dual-modal Depth Representation**: A dual-modal depth representation method is introduced, representing the depth of each pixel as a distribution rather than a single depth value, to explicitly represent depth uncertainty. 3. **Depth Discontinuity-based Spatial Regularization Loss Function**: A general loss function formula is proposed to regularize depth maps through depth discontinuity, helping to learn depth discontinuity and optimize depth maps. ### Method Overview 1. **Feature Extraction**: A Feature Pyramid Network (FPN) is used to extract multi-scale features from color images. 2. **Coarse-to-fine PatchMatch Stereo (PMS)**: An initial depth map is generated in a coarse-to-fine manner. 3. **Depth Discontinuity Learning**: A 2D CNN-based U-Net architecture is used to estimate dual-modal depth density parameters for each pixel and generate geometric edge maps. 4. **Loss Functions**: The loss functions include depth-to-ground truth loss, edge-to-depth loss, smoothness loss, and dual-modal depth loss, which together optimize network performance. ### Experimental Results The authors conducted extensive experiments on multiple benchmark datasets, including DTU, ETH3D, "Tanks and Temples," and BlendedMVS. The experimental results show that the proposed method outperforms the baseline method Patchmatchnet in reconstruction quality and demonstrates strong generalization capabilities across different datasets. ### Conclusion The proposed DDL method significantly improves the reconstruction accuracy of multi-view stereo networks by jointly estimating depth maps and boundary maps while maintaining completeness. The experimental results on multiple datasets validate its effectiveness and generalization capability.