Abstract:Recent learning-based methods demonstrate their strong ability to estimate depth for multi-view stereo reconstruction. However, most of these methods directly extract features via regular or deformable convolutions, and few works consider the alignment of the receptive fields between views while constructing the cost volume. Through analyzing the constraint and inference of previous MVS networks, we find that there are still some shortcomings that hinder the performance. To deal with the above issues, we propose an Epipolar-Guided Multi-View Stereo Network with Interval-Aware Label (EI-MVSNet), which includes an epipolar-guided volume construction module and an interval-aware depth estimation module in a unified architecture for MVS. The proposed EI-MVSNet enjoys several merits. First, in the epipolar-guided volume construction module, we construct cost volume with features from aligned receptive fields between different pairs of reference and source images via epipolar-guided convolutions, which take rotation and scale changes into account. Second, in the interval-aware depth estimation module, we attempt to supervise the cost volume directly and make depth estimation independent of extraneous values by perceiving the upper and lower boundaries, which can achieve fine-grained predictions and enhance the reasoning ability of the network. Extensive experimental results on two standard benchmarks demonstrate that our EI-MVSNet performs favorably against state-of-the-art MVS methods. Specifically, our EI-MVSNet ranks on both intermediate and advanced subsets of the Tanks and Temples benchmark, which verifies the high precision and strong robustness of our model.

Vis-MVSNet: Visibility-Aware Multi-view Stereo Network

Visibility-Aware Point-Based Multi-View Stereo Network

Multi-View Stereo Representation Revist: Region-Aware MVSNet

Multi-View Stereo Network Based on Attention Mechanism and Neural Volume Rendering

PVSNet: Pixelwise Visibility-Aware Multi-View Stereo Network

Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for 3D Reconstruction

Attention-enhanced multi-source cost volume multi-view stereo

EI-MVSNet: Epipolar-Guided Multi-View Stereo Network With Interval-Aware Label

Multi-View Stereo Network with attention thin volume

Bi-ClueMVSNet: Learning Bidirectional Occlusion Clues for Multi-View Stereo.

EPP-MVSNet: Epipolar-assembling based Depth Prediction for Multi-view Stereo

DSC-MVSNet: attention aware cost volume regularization based on depthwise separable convolution for multi-view stereo

MVSNet: Depth Inference for Unstructured Multi-view Stereo

NR-MVSNet: Learning Multi-View Stereo Based on Normal Consistency and Depth Refinement

Adaptive Cost Aggregation in Iterative Depth Estimation for Efficient Multi-view Stereo.

MTD-MVSNet: Multi-view Stereo Network with Multi-scale Transformer and Dual Attention

GeoMVSNet: Learning Multi-View Stereo with Geometry Perception

OD-MVSNet: Omni-dimensional dynamic multi-view stereo network

Real-Time Unsupervised Multi-View Depth Estimation Network for Virtual View Synthesis

MVSTER: Epipolar Transformer for Efficient Multi-View Stereo

Uanet: uncertainty-aware cost volume aggregation-based multi-view stereo for 3D reconstruction