A Cascade Network with Adaptive Depth Hypotheses Estimation for Multi-View Stereo and Image Three-Dimensional Reconstruction

Dong Wang,Zhong Liu,Haosong Yue,Xingming Wu,Weihai Chen
DOI: https://doi.org/10.1109/iciea61579.2024.10664909
2024-01-01
Abstract:Multi-view stereo (MVS) aims to infer the geometric structure information from a sequence of overlapping images, taking into account the predetermined calibrated camera parameters. It is one of the most crucial approaches for the research and development of image three-dimensional (3D) reconstruction. With the significant advancements in computer vision and artificial intelligence, methods of deploying deep learning have surpassed traditional ones in terms of reconstruction quality and runtime. Recent cascade MVS networks can estimate high-resolution depth maps through building cost volume pyramids in a coarse-to-fine manner. However, the depth hypotheses of these methods are fixed, which limits the inference ability of the networks and leads to inaccurate effects of depth maps. In this paper, we present a cascade network with adaptive depth hypotheses estimation. We generate an initial depth map at the coarsest level of an image over the entire depth range, and adaptively obtain hypothetical depth planes guided by the previous level depth maps. Moreover, to efficiently enhance the accuracy of point-cloud reconstruction, our proposed network concatenates the depth residual with the depth from the previous level to produce a refined depth map. Multi-level fusion can better integrate the consistency information of global and local features, and exert the capacity of image understanding to some extent. The results of extensive experiments have demonstrated the competitive performance on the DTU and Tanks & Temples benchmarks, providing positive assistance and support for pro-moting the tasks of scene understanding and 3D reconstruction.
What problem does this paper attempt to address?