Abstract:Deep learning has recently demonstrated its excellent performance on the task of multi-view stereo (MVS). However, loss functions applied for deep MVS are rarely studied. In this paper, we first analyze existing loss functions' properties for deep depth based MVS approaches. Regression based loss leads to inaccurate continuous results by computing mathematical expectation, while classification based loss outputs discretized depth values. To this end, we then propose a novel loss function, named adaptive Wasserstein loss, which is able to narrow down the difference between the true and predicted probability distributions of depth. Besides, a simple but effective offset module is introduced to better achieve sub-pixel prediction accuracy. Extensive experiments on different benchmarks, including DTU, Tanks and Temples and BlendedMVS, show that the proposed method with the adaptive Wasserstein loss and the offset module achieves state-of-the-art performance.

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to address the issue of loss function design in the task of Multi-view Stereo (MVS) reconstruction. Specifically, existing deep learning-based MVS methods have deficiencies in the choice of loss functions, leading to inaccurate depth predictions or failing to achieve sub-pixel level accuracy. This paper analyzes the characteristics of existing loss functions and proposes a new loss function—Adaptive Wasserstein Loss, along with a simple Offset Module, to improve the accuracy and continuity of depth predictions. ### Background and Problem Description Multi-view Stereo (MVS) is a fundamental problem of recovering dense 3D representations from a set of multi-view images, widely used in fields such as augmented reality, 3D modeling, and autonomous driving. Traditional MVS methods perform well in texture-rich areas but struggle to achieve accurate and complete 3D reconstruction in low-texture regions, under lighting changes, and reflections. In recent years, deep learning-based methods have achieved significant performance improvements in MVS tasks, but the design of loss functions has not received sufficient attention. ### Problems with Existing Methods 1. **Regression Loss**: Regression loss predicts continuous depth values by calculating the mathematical expectation, but in the case of multi-peak distributions, the expected value may be far from any predicted peak, leading to prediction errors. 2. **Classification Loss**: Classification loss outputs fixed discrete depth values, making it difficult to achieve sub-pixel accuracy in wide depth range scenarios. ### Proposed Methods 1. **Adaptive Wasserstein Loss**: This loss function can reduce the difference between the true depth distribution and the predicted depth distribution, even if they do not have a common support set. 2. **Offset Module**: By predicting the offset of each fixed discrete depth value, the discrete distribution is converted into a continuous distribution, thereby improving the accuracy of depth prediction. ### Experimental Results Extensive experiments were conducted on multiple benchmark datasets, including DTU, Tanks and Temples, and BlendedMVS. The experimental results show that the proposed method achieved state-of-the-art performance on these datasets, validating the effectiveness of the Adaptive Wasserstein Loss and the Offset Module. ### Main Contributions 1. Proposed a simple yet effective Offset Module that can generate continuous depth values. 2. Introduced the Adaptive Wasserstein Loss, which can reduce the difference between the predicted depth distribution and the true depth distribution even when they do not have a common support set. 3. Achieved state-of-the-art performance on the DTU and Tanks and Temples benchmark datasets. Through these contributions, this paper provides a new loss function design approach for deep learning-based MVS methods, effectively improving the accuracy and robustness of depth predictions.

Adaptive Learning for Multi-view Stereo Reconstruction

Multi-View Stereo Representation Revist: Region-Aware MVSNet

Adaptive Range Guided Multi-view Depth Estimation with Normal Ranking Loss

Multistage Pixel-Visibility Learning with Cost Regularization for Multiview Stereo

EA-MVSNet: Learning Error-Awareness for Enhanced Multi-View Stereo

High-Quality Depth Recovery Via Interactive Multi-view Stereo

Mono‐MVS: textureless‐aware multi‐view stereo assisted by monocular prediction

A Confidence-based Iterative Solver of Depths and Surface Normals for Deep Multi-view Stereo

Learning Unsupervised Multi-View Stereopsis via Robust Photometric Consistency

A Light Multi-View Stereo Method with Patch-Uncertainty Awareness

Sparse Prior Guided Deep Multi-View Stereo

SDL-MVS: View Space and Depth Deformable Learning Paradigm for Multi-View Stereo Reconstruction in Remote Sensing

LNMVSNet: A Low-Noise Multi-View Stereo Depth Inference Method for 3D Reconstruction

Learning Inverse Depth Regression for Multi-View Stereo with Correlation Cost Volume

RayMVSNet++: Learning Ray-based 1D Implicit Fields for Accurate Multi-View Stereo

HC-MVSNet: A Probability Sampling-Based Multi-View-stereo Network with Hybrid Cascade Structure for 3D Reconstruction

DDL-MVS: Depth Discontinuity Learning for MVS Networks

Learning-based Multi-View Stereo: A Survey

Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for 3D Reconstruction

NR-MVSNet: Learning Multi-View Stereo Based on Normal Consistency and Depth Refinement

Multi-View Stereo Network Based on Attention Mechanism and Neural Volume Rendering