Abstract:Aerial building depth estimation is a crucial task in 3D digital urban reconstruction and learning-based multi-view stereo (MVS) methods have recently shown promising results in this field. However, these methods are mainly developed by modifying the general learning-based MVS framework for aerial depth estimation, which lack consideration about the intrinsic structures of buildings and result in insufficient accuracy. Therefore, we propose an end-to-end edge aware depth inference network for large-scale aerial building multi-views stereo, called EG-MVSNet , which incorporates the building edge information and jointly estimate the depth map and edge map. Firstly, we propose a novel Edge-Sensitive Network based on the differentiable Dynamic Sobel Kernels to obtain reliable building edge features while eliminating other irrelevant features. We further propose an UNet-like Edge Prediction Branch and a Building Edge-Depth Loss to constrain the model focus primarily on the building edge features. Notably, the pseudo ground truth (GT) edge map for each aerial image is obtained with classical gradient operators which do not require additional annotation. Secondly, to incorporate the edge features into the depth prediction module, we introduce an Inter-volume Adaptive Fusion Module that adaptively incorporates the edge features volume into a standard cost volume and guides the regularization of the cost volume. An Edge Depth Refinement Module is further proposed to performs 2D-guidance refinement and avoid over-smoothed or blurred depth boundaries. Extensive experiments on the WHU dataset and LuoJia-MVS dataset show that our model significantly outperforms state-of-the-art performance by more than 22% mean absolute error (MAE) compared to RED-Net and 57% MAE compared to MVSNet. Additionally, to validate our proposed model, we reconstruct a synthetic aerial building benchmark based on WHU dataset. The results as far as correctness and accuracy exceeded the results of other MVS methods in a between-method comparison by at least 12% in MAE metric. The dataset and code can be available at https://github.com/zs670980918/EG-MVSNet .

EA-MVSNet: Learning Error-Awareness for Enhanced Multi-View Stereo

Multi-View Stereo Representation Revist: Region-Aware MVSNet

EI-MVSNet: Epipolar-Guided Multi-View Stereo Network With Interval-Aware Label

MFE‐MVSNet: Multi‐scale feature enhancement multi‐view stereo with bi‐directional connections

NR-MVSNet: Learning Multi-View Stereo Based on Normal Consistency and Depth Refinement

EPP-MVSNet: Epipolar-assembling based Depth Prediction for Multi-view Stereo

ARAI-MVSNet: A multi-view stereo depth estimation network with adaptive depth range and depth interval

DSC-MVSNet: attention aware cost volume regularization based on depthwise separable convolution for multi-view stereo

RayMVSNet++: Learning Ray-based 1D Implicit Fields for Accurate Multi-View Stereo

Enhanced multi view 3D reconstruction with improved MVSNet

Multi-View Stereo Network Based on Attention Mechanism and Neural Volume Rendering

GeoMVSNet: Learning Multi-View Stereo with Geometry Perception

Vis-MVSNet: Visibility-Aware Multi-view Stereo Network

MTD-MVSNet: Multi-view Stereo Network with Multi-scale Transformer and Dual Attention

ES-MVSNet: Efficient Framework for End-to-end Self-supervised Multi-View Stereo

Multi-view depth estimation based on multi-feature aggregation for 3D reconstruction

MVSNet: Depth Inference for Unstructured Multi-view Stereo

Multi-View Stereo Representation Revisit: Region-Aware MVSNet

Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for 3D Reconstruction

Edge aware depth inference for large-scale aerial building multi-view stereo

N2MVSNet: Non-Local Neighbors Aware Multi-View Stereo Network