Abstract:In this paper, we propose a practical three-dimensional (3D) real-scene reconstruction framework named Deep3D, which is paired with a deep learning based multi-view stereo (MVS) matching model named the adaptive multi-view aggregation matching (Ada-MVS) model, to obtain a 3D textured mesh model from multi-view oblique aerial images. Deep3D is the first deep learning based framework for 3D scene reconstruction, in which aerial triangulation and view selection are first performed on the input images, and the depth map of each image is then inferred using the pretrained Ada-MVS model. All the inferred depth maps are then fused into a dense point cloud after filtering the outliers. Finally, the 3D textured mesh is extracted from the dense 3D points as the final product. In the Ada-MVS model, a novel adaptive inter-view aggregation module is specially proposed to address the inconsistent information among oblique views and to fuse the multi-view costs into a robust cost volume. A lightweight recurrent regularization module is also designed for high-efficiency processing of high-capacity aerial images with large depth variations. Moreover, as oblique aerial image datasets are currently lacking, we built a large-scale synthetic multi-view oblique aerial image dataset (WHU-OMVS dataset) for deep learning based model training and methodology evaluation for the task of 3D scene reconstruction. The experimental results show that, firstly, the proposed Ada-MVS model has obvious advantages when used with high-capacity oblique aerial images, compared with several relevant learning-based MVS methods. Secondly, through a comprehensive comparison with popular commercial software packages and open-source solutions, it is shown that the proposed Deep3D framework outperforms all the other solutions in terms of reconstruction quality, and outperforms all the open-source solutions and some of the software packages in terms of efficiency on the WHU-OMVS dataset. Thirdly, the Deep3D framework shows a stable generalization ability and excellent performance when applied to other oblique or nadir aerial images, without any further fine-tuning. The dataset and code will be available at http://gpcv.whu.edu.cn/data .

Blendedmvs: A Large-Scale Dataset For Generalized Multi-View Stereo Networks

Benchmarking Large-Scale Multi-View 3D Reconstruction Using Realistic Synthetic Images

Hybrid-MVS: Robust Multi-View Reconstruction with Hybrid Optimization of Visual and Depth Cues

LoliMVS: An End-to-End Network for Multiview Stereo With Low-Light Images

HC-MVSNet: A Probability Sampling-Based Multi-View-stereo Network with Hybrid Cascade Structure for 3D Reconstruction

MVImgNet2.0: A Larger-scale Dataset of Multi-view Images

Multi-View Stereo Representation Revist: Region-Aware MVSNet

Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for 3D Reconstruction

DP-MVS: Detail Preserving Multi-View Surface Reconstruction of Large-Scale Scenes

Rethinking the Multi-view Stereo from the Perspective of Rendering-based Augmentation

MVSNet: Depth Inference for Unstructured Multi-view Stereo

Mono‐MVS: textureless‐aware multi‐view stereo assisted by monocular prediction

High completeness multi-view stereo for dense reconstruction of large-scale urban scenes

A Novel Recurrent Encoder-Decoder Structure for Large-Scale Multi-View Stereo Reconstruction From an Open Aerial Dataset

MTD-MVSNet: Multi-view Stereo Network with Multi-scale Transformer and Dual Attention

ARAI-MVSNet: A multi-view stereo depth estimation network with adaptive depth range and depth interval

Vis-MVSNet: Visibility-Aware Multi-view Stereo Network

Bi-ClueMVSNet: Learning Bidirectional Occlusion Clues for Multi-View Stereo.

SA-MVSNet: Self-attention-based multi-view stereo network for 3D reconstruction of images with weak texture

GC-MVSNet: Multi-View, Multi-Scale, Geometrically-Consistent Multi-View Stereo

Deep learning based multi-view stereo matching and 3D scene reconstruction from oblique aerial images