Multi-View Stereo with Learnable Cost Metric

Guidong Yang,Xunkuai Zhou,Chuanxiang Gao,Benyun Zhao,Jihan Zhang,Yizhou Chen,Xi Chen,Ben M. Chen
DOI: https://doi.org/10.1109/IROS55552.2023.10341606
2023-10-01
Abstract:In this paper, we present LCM-MVSNet, a novel multi-view stereo (MVS) network with learnable cost metric (LCM) for more accurate and complete depth estimation and dense point cloud reconstruction. To adapt to the scene variation and improve the reconstruction quality in non-Lambertian low-textured scenes, we propose LCM to adaptively aggregate multi-view matching similarity into the 3D cost volume by leveraging sparse points hints. The proposed LCM benefits the MVS approaches in four folds, including depth estimation enhancement, reconstruction quality improvement, memory footprint reduction, and computational burden alleviation, allowing the depth inference for high-resolution images to achieve more accurate and complete reconstruction. Moreover, we improve the depth estimation by enhancing the propagation of shallow features via a bottom-up path and strengthen the end-to-end supervision by adapting the focal loss to reduce ambiguity caused by sample imbalance. Extensive experiments on two benchmark datasets show that our network achieves state-of-the-art performance on the DTU dataset and exhibits strong generalization ability with a competitive performance on the Tanks and Temples benchmark. Furthermore, we deploy our LCM-MVSNet into the real-world application for large-scale 3D reconstruction based on multi-view aerial images collected by self-developed UAV, demonstrating the robustness and scalability of our method. More detailed results are available in the Appendix11shorturl.at/rBG28
Computer Science
What problem does this paper attempt to address?