SDL-MVS: View Space and Depth Deformable Learning Paradigm for Multi-View Stereo Reconstruction in Remote Sensing

Yong-Qiang Mao,Hanbo Bi,Liangyu Xu,Kaiqiang Chen,Zhirui Wang,Xian Sun,Kun Fu
DOI: https://doi.org/10.48550/arXiv.2405.17140
2024-05-27
Abstract:Research on multi-view stereo based on remote sensing images has promoted the development of large-scale urban 3D reconstruction. However, remote sensing multi-view image data suffers from the problems of occlusion and uneven brightness between views during acquisition, which leads to the problem of blurred details in depth estimation. To solve the above problem, we re-examine the deformable learning method in the Multi-View Stereo task and propose a novel paradigm based on view Space and Depth deformable Learning (SDL-MVS), aiming to learn deformable interactions of features in different view spaces and deformably model the depth ranges and intervals to enable high accurate depth estimation. Specifically, to solve the problem of view noise caused by occlusion and uneven brightness, we propose a Progressive Space deformable Sampling (PSS) mechanism, which performs deformable learning of sampling points in the 3D frustum space and the 2D image space in a progressive manner to embed source features to the reference feature adaptively. To further optimize the depth, we introduce Depth Hypothesis deformable Discretization (DHD), which achieves precise positioning of the depth prior by adaptively adjusting the depth range hypothesis and performing deformable discretization of the depth interval hypothesis. Finally, our SDL-MVS achieves explicit modeling of occlusion and uneven brightness faced in multi-view stereo through the deformable learning paradigm of view space and depth, achieving accurate multi-view depth estimation. Extensive experiments on LuoJia-MVS and WHU datasets show that our SDL-MVS reaches state-of-the-art performance. It is worth noting that our SDL-MVS achieves an MAE error of 0.086, an accuracy of 98.9% for <0.6m, and 98.9% for <3-interval on the LuoJia-MVS dataset under the premise of three views as input.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address the issue of depth estimation blur details caused by occlusion and brightness inconsistency between viewpoints in remote sensing multi-view stereo reconstruction. Specifically, the paper points out that during the acquisition of remote sensing multi-view image data, problems such as occlusion and brightness inconsistency between viewpoints are often encountered. These issues lead to blurred details in depth estimation, thereby affecting the accuracy of 3D reconstruction. To overcome these problems, the paper proposes a new paradigm based on view-space and depth deformable learning (SDL-MVS), aiming to achieve high-precision depth estimation by learning deformable interactions of features in different view spaces and deformable modeling of depth ranges and intervals. The main contributions of the paper include: 1. Re-examining the deformable learning methods in multi-view stereo tasks and proposing a new paradigm based on view-space and depth deformable learning (SDL-MVS) to address the issues of occlusion and brightness inconsistency in remote sensing multi-view images, achieving high-precision depth estimation. 2. Proposing a progressive spatial deformable sampling mechanism (PSS), which aggregates features from different viewpoints through progressive deformable learning in 3D cone space and 2D image space, effectively reducing feature noise caused by occlusion and brightness inconsistency. 3. Introducing a depth hypothesis deformable discretization mechanism (DHD), which further optimizes multi-stage depth estimation by exploring the uncertainty relationship of depth range hypotheses and the discretization strategy of depth interval hypotheses, reducing the impact of adverse noise. 4. Extensive experiments on the LuoJia-MVS and WHU datasets show that SDL-MVS achieves new state-of-the-art performance when inputting 3 or 5 viewpoints. Specifically, on the LuoJia-MVS dataset, SDL-MVS achieves an MAE error of 0.086 with three-view input, an accuracy of 98.9% for <0.6 meters, and an accuracy of 98.9% for <3 intervals.