Adaptive Learning for Multi-view Stereo Reconstruction

Qinglu Min,Jie Zhao,Zhihao Zhang,Chen Min
2024-04-08
Abstract:Deep learning has recently demonstrated its excellent performance on the task of multi-view stereo (MVS). However, loss functions applied for deep MVS are rarely studied. In this paper, we first analyze existing loss functions' properties for deep depth based MVS approaches. Regression based loss leads to inaccurate continuous results by computing mathematical expectation, while classification based loss outputs discretized depth values. To this end, we then propose a novel loss function, named adaptive Wasserstein loss, which is able to narrow down the difference between the true and predicted probability distributions of depth. Besides, a simple but effective offset module is introduced to better achieve sub-pixel prediction accuracy. Extensive experiments on different benchmarks, including DTU, Tanks and Temples and BlendedMVS, show that the proposed method with the adaptive Wasserstein loss and the offset module achieves state-of-the-art performance.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to address the issue of loss function design in the task of Multi-view Stereo (MVS) reconstruction. Specifically, existing deep learning-based MVS methods have deficiencies in the choice of loss functions, leading to inaccurate depth predictions or failing to achieve sub-pixel level accuracy. This paper analyzes the characteristics of existing loss functions and proposes a new loss function—Adaptive Wasserstein Loss, along with a simple Offset Module, to improve the accuracy and continuity of depth predictions. ### Background and Problem Description Multi-view Stereo (MVS) is a fundamental problem of recovering dense 3D representations from a set of multi-view images, widely used in fields such as augmented reality, 3D modeling, and autonomous driving. Traditional MVS methods perform well in texture-rich areas but struggle to achieve accurate and complete 3D reconstruction in low-texture regions, under lighting changes, and reflections. In recent years, deep learning-based methods have achieved significant performance improvements in MVS tasks, but the design of loss functions has not received sufficient attention. ### Problems with Existing Methods 1. **Regression Loss**: Regression loss predicts continuous depth values by calculating the mathematical expectation, but in the case of multi-peak distributions, the expected value may be far from any predicted peak, leading to prediction errors. 2. **Classification Loss**: Classification loss outputs fixed discrete depth values, making it difficult to achieve sub-pixel accuracy in wide depth range scenarios. ### Proposed Methods 1. **Adaptive Wasserstein Loss**: This loss function can reduce the difference between the true depth distribution and the predicted depth distribution, even if they do not have a common support set. 2. **Offset Module**: By predicting the offset of each fixed discrete depth value, the discrete distribution is converted into a continuous distribution, thereby improving the accuracy of depth prediction. ### Experimental Results Extensive experiments were conducted on multiple benchmark datasets, including DTU, Tanks and Temples, and BlendedMVS. The experimental results show that the proposed method achieved state-of-the-art performance on these datasets, validating the effectiveness of the Adaptive Wasserstein Loss and the Offset Module. ### Main Contributions 1. Proposed a simple yet effective Offset Module that can generate continuous depth values. 2. Introduced the Adaptive Wasserstein Loss, which can reduce the difference between the predicted depth distribution and the true depth distribution even when they do not have a common support set. 3. Achieved state-of-the-art performance on the DTU and Tanks and Temples benchmark datasets. Through these contributions, this paper provides a new loss function design approach for deep learning-based MVS methods, effectively improving the accuracy and robustness of depth predictions.