Self-Supervised Multi-View Stereo with Adaptive Depth Priors

Lintao Xiang,Hujun Yin
DOI: https://doi.org/10.1109/icip51287.2024.10648017
2024-01-01
Abstract:Although supervised multi-view 3D reconstruction methods have achieved satisfying performance recently, there are major limitations such as high costs for 3D data collection and poor generalization to unseen scenes. Hence, unsupervised 3D reconstruction approaches based on photometric consistency are being explored. However, variations in lighting conditions among different views and reflective surfaces within a scene can undermine the reliability of these approaches. In this paper, we propose adaptive depth priors as pseudo-labels to guide the optimization process of self-supervised multiview stereo. First, sparse depth priors are generated based on the conventional structure from motion (SfM) and multi-view stereo (MVS) algorithms, which are then fed into a monocular depth estimation network to learn the adapted depth priors. Besides, a spatial-frequency fusion structure is designed to enhance global perception in the feature matching of MVS by combining local dependency from spatial domain with global contextual information in the frequency domain. Extensive experiments on DTU and Tanks & Temples datasets demonstrate that the proposed ADP-MVSNet achieves markedly improved results over the existing unsupervised approaches and even outperforms some supervised methods.
What problem does this paper attempt to address?