SDL-MVS: View Space and Depth Deformable Learning Paradigm for Multi-View Stereo Reconstruction in Remote Sensing

Yong-Qiang Mao,Hanbo Bi,Liangyu Xu,Kaiqiang Chen,Zhirui Wang,Xian Sun,Kun Fu

DOI: https://doi.org/10.48550/arXiv.2405.17140

2024-05-27

Abstract:Research on multi-view stereo based on remote sensing images has promoted the development of large-scale urban 3D reconstruction. However, remote sensing multi-view image data suffers from the problems of occlusion and uneven brightness between views during acquisition, which leads to the problem of blurred details in depth estimation. To solve the above problem, we re-examine the deformable learning method in the Multi-View Stereo task and propose a novel paradigm based on view Space and Depth deformable Learning (SDL-MVS), aiming to learn deformable interactions of features in different view spaces and deformably model the depth ranges and intervals to enable high accurate depth estimation. Specifically, to solve the problem of view noise caused by occlusion and uneven brightness, we propose a Progressive Space deformable Sampling (PSS) mechanism, which performs deformable learning of sampling points in the 3D frustum space and the 2D image space in a progressive manner to embed source features to the reference feature adaptively. To further optimize the depth, we introduce Depth Hypothesis deformable Discretization (DHD), which achieves precise positioning of the depth prior by adaptively adjusting the depth range hypothesis and performing deformable discretization of the depth interval hypothesis. Finally, our SDL-MVS achieves explicit modeling of occlusion and uneven brightness faced in multi-view stereo through the deformable learning paradigm of view space and depth, achieving accurate multi-view depth estimation. Extensive experiments on LuoJia-MVS and WHU datasets show that our SDL-MVS reaches state-of-the-art performance. It is worth noting that our SDL-MVS achieves an MAE error of 0.086, an accuracy of 98.9% for <0.6m, and 98.9% for <3-interval on the LuoJia-MVS dataset under the premise of three views as input.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper attempts to address the issue of depth estimation blur details caused by occlusion and brightness inconsistency between viewpoints in remote sensing multi-view stereo reconstruction. Specifically, the paper points out that during the acquisition of remote sensing multi-view image data, problems such as occlusion and brightness inconsistency between viewpoints are often encountered. These issues lead to blurred details in depth estimation, thereby affecting the accuracy of 3D reconstruction. To overcome these problems, the paper proposes a new paradigm based on view-space and depth deformable learning (SDL-MVS), aiming to achieve high-precision depth estimation by learning deformable interactions of features in different view spaces and deformable modeling of depth ranges and intervals. The main contributions of the paper include: 1. Re-examining the deformable learning methods in multi-view stereo tasks and proposing a new paradigm based on view-space and depth deformable learning (SDL-MVS) to address the issues of occlusion and brightness inconsistency in remote sensing multi-view images, achieving high-precision depth estimation. 2. Proposing a progressive spatial deformable sampling mechanism (PSS), which aggregates features from different viewpoints through progressive deformable learning in 3D cone space and 2D image space, effectively reducing feature noise caused by occlusion and brightness inconsistency. 3. Introducing a depth hypothesis deformable discretization mechanism (DHD), which further optimizes multi-stage depth estimation by exploring the uncertainty relationship of depth range hypotheses and the discretization strategy of depth interval hypotheses, reducing the impact of adverse noise. 4. Extensive experiments on the LuoJia-MVS and WHU datasets show that SDL-MVS achieves new state-of-the-art performance when inputting 3 or 5 viewpoints. Specifically, on the LuoJia-MVS dataset, SDL-MVS achieves an MAE error of 0.086 with three-view input, an accuracy of 98.9% for <0.6 meters, and an accuracy of 98.9% for <3 intervals.

SDL-MVS: View Space and Depth Deformable Learning Paradigm for Multi-View Stereo Reconstruction in Remote Sensing

LODM: Large-scale Online Dense Mapping for UAV

Multi-View Stereo Representation Revist: Region-Aware MVSNet

Hybrid-MVS: Robust Multi-View Reconstruction with Hybrid Optimization of Visual and Depth Cues

SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical Refinement and EM optimization

NR-MVSNet: Learning Multi-View Stereo Based on Normal Consistency and Depth Refinement

High-Quality Depth Recovery Via Interactive Multi-view Stereo

A general deep learning based framework for 3D reconstruction from multi-view stereo satellite images

A Multitask Network for Multiview Stereo Reconstruction: When Semantic Consistency-Based Clustering Meets Depth Estimation Optimization

SA-SatMVS: Slope Feature-Aware and Across-Scale Information Integration for Large-Scale Earth Terrain Multi-View Stereo

High completeness multi-view stereo for dense reconstruction of large-scale urban scenes

RIAV-MVS: Recurrent-Indexing an Asymmetric Volume for Multi-View Stereo

RayMVSNet++: Learning Ray-based 1D Implicit Fields for Accurate Multi-View Stereo

HC-MVSNet: A Probability Sampling-Based Multi-View-stereo Network with Hybrid Cascade Structure for 3D Reconstruction

Deep learning based multi-view stereo matching and 3D scene reconstruction from oblique aerial images

Multi-View Stereo Representation Revisit: Region-Aware MVSNet

Semantic 3D Reconstruction with Learning MVS and 2D Segmentation of Aerial Images

Learning-based Multi-View Stereo: A Survey

LNMVSNet: A Low-Noise Multi-View Stereo Depth Inference Method for 3D Reconstruction

DDL-MVS: Depth Discontinuity Learning for MVS Networks

Adaptive Learning for Multi-view Stereo Reconstruction