Abstract:Multi-View Stereo (MVS) is a fundamental problem in geometric computer vision which aims to reconstruct a scene using multi-view images with known camera parameters. However, the mainstream approaches represent the scene with a fixed all-pixel depth range and equal depth interval partition, which will result in inadequate utilization of depth planes and imprecise depth estimation. In this paper, we present a novel multi-stage coarse-to-fine framework to achieve adaptive all-pixel depth range and depth interval. We predict a coarse depth map in the first stage, then an Adaptive Depth Range Prediction module is proposed in the second stage to zoom in the scene by leveraging the reference image and the obtained depth map in the first stage and predict a more accurate all-pixel depth range for the following stages. In the third and fourth stages, we propose an Adaptive Depth Interval Adjustment module to achieve adaptive variable interval partition for pixel-wise depth range. The depth interval distribution in this module is normalized by Z-score, which can allocate dense depth hypothesis planes around the potential ground truth depth value and vice versa to achieve more accurate depth estimation. Extensive experiments on four widely used benchmark datasets (DTU, TnT, BlendedMVS, ETH 3D) demonstrate that our model achieves state-of-the-art performance and yields competitive generalization ability. Particularly, our method achieves the highest Acc and Overall on the DTU dataset, while attaining the highest Recall and F1 -score on the Tanks and Temples intermediate and advanced dataset. Moreover, our method also achieves the lowest e1 and e3 on the BlendedMVS dataset and the highest Acc and F1 -score on the ETH 3D dataset, surpassing all listed methods. Project website: https://github.com/zs670980918/ARAI-MVSNet

Multi-view depth estimation based on multi-feature aggregation for 3D reconstruction

MFF-Net: Towards Efficient Monocular Depth Completion With Multi-Modal Feature Fusion

MFE‐MVSNet: Multi‐scale feature enhancement multi‐view stereo with bi‐directional connections

FA-MSVNet: multi-scale and multi-view feature aggregation methods for stereo 3D reconstruction

Multi-View Stereo Representation Revist: Region-Aware MVSNet

Enhanced multi view 3D reconstruction with improved MVSNet

Hybrid-MVS: Robust Multi-View Reconstruction with Hybrid Optimization of Visual and Depth Cues

Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for 3D Reconstruction

HC-MVSNet: A Probability Sampling-Based Multi-View-stereo Network with Hybrid Cascade Structure for 3D Reconstruction

NR-MVSNet: Learning Multi-View Stereo Based on Normal Consistency and Depth Refinement

EPP-MVSNet: Epipolar-assembling based Depth Prediction for Multi-view Stereo

LNMVSNet: A Low-Noise Multi-View Stereo Depth Inference Method for 3D Reconstruction

Attention-enhanced multi-source cost volume multi-view stereo

MTD-MVSNet: Multi-view Stereo Network with Multi-scale Transformer and Dual Attention

PA-MVSNet: Sparse-to-Dense Multi-View Stereo With Pyramid Attention

SA-MVSNet: Self-attention-based multi-view stereo network for 3D reconstruction of images with weak texture

ARAI-MVSNet: A multi-view stereo depth estimation network with adaptive depth range and depth interval

DSC-MVSNet: attention aware cost volume regularization based on depthwise separable convolution for multi-view stereo

OD-MVSNet: Omni-dimensional dynamic multi-view stereo network

Multi-View Depth Map Sampling for 3D Reconstruction of Natural Scene

Multi-View Stereo Network Based on Attention Mechanism and Neural Volume Rendering