Abstract:Multiple-view stereo has potential applications in robotic operations and autonomous driving (unstructured environment construction, visual servo). With assisted depth information, inertial navigation systems can achieve precise navigation. It is, especially suitable for GPS failures in complex environments. Accurate depth estimation is a challenge in low-textured or occluded regions. To alleviate the inference of incorrect depth, a multi-stage pixel-visibility learning-based stereo network is presented in this paper. Its improvements are as follows: 1) a new content-adaptive cost volume aggregation mechanism based on neighboring pixel-wise visibility is designed to effectively produce more accurate and smoother depth map predictions in the object boundary. 2) global convolution block and boundary refinement block are developed to regularize its cost volume, they can learn the inherent constraints of feature matching correspondence and effectively mitigate the depth estimation uncertainty in low-textured regions. 3) a new loss function is designed to measure the uncertainty of predicted probability distribution and enhance the reliability of depth map inference. Experimental results on the indoor DTU datasets and the outdoor Tanks & Temples datasets indicate that our method can achieve superior performance and has a powerful generalization ability, which is comparable to state-of-the-art works. Note to Practitioners—Multiple-view stereo (MVS) can estimate dense 3D representations of scenes, which is widely used in autonomous driving, robotic navigation, virtual reality (VR), and augmented reality (AR). Aiming at the problem of incorrect depth inference in low-textured or occluded regions, this work proposes a novel multi-stage depth prediction method based on neighboring pixel-wise visibility. Our method cannot only achieve accurate depth estimation for robot perception but also make no concession to real-time performance. It is clear that the proposed method has good potential in 3D reconstruction, robotic navigation, and VR/AR fields to provide accurate depth estimation in real-time with limited memory consumption.

DI-MVS: Learning Efficient Multi-View Stereo with Depth-Aware Iterations

Multi-View Stereo Representation Revist: Region-Aware MVSNet

Efficient Multi-view Stereo by Dynamic Cost Volume and Cross-scale Propagation

RIAV-MVS: Recurrent-Indexing an Asymmetric Volume for Multi-View Stereo

Efficient Multi-view Stereo by Iterative Dynamic Cost Volume

Miper-MVS: Multi-scale Iterative Probability Estimation with Refinement for Efficient Multi-View Stereo

NR-MVSNet: Learning Multi-View Stereo Based on Normal Consistency and Depth Refinement

Hybrid-MVS: Robust Multi-View Reconstruction with Hybrid Optimization of Visual and Depth Cues

Multi-View Stereo with Learnable Cost Metric

Sparse Prior Guided Deep Multi-View Stereo

A Light Multi-View Stereo Method with Patch-Uncertainty Awareness

Multistage Pixel-Visibility Learning with Cost Regularization for Multiview Stereo

Adaptive Cost Aggregation in Iterative Depth Estimation for Efficient Multi-view Stereo.

RayMVSNet++: Learning Ray-based 1D Implicit Fields for Accurate Multi-View Stereo

SDL-MVS: View Space and Depth Deformable Learning Paradigm for Multi-View Stereo Reconstruction in Remote Sensing

DSC-MVSNet: attention aware cost volume regularization based on depthwise separable convolution for multi-view stereo

EI-MVSNet: Epipolar-Guided Multi-View Stereo Network With Interval-Aware Label

GeoMVSNet: Learning Multi-View Stereo with Geometry Perception

IB-MVS: An Iterative Algorithm for Deep Multi-View Stereo based on Binary Decisions

EMO-MVS: Error-Aware Multi-Scale Iterative Variable Optimizer for Efficient Multi-View Stereo

Visual Consistency Enhancement for Multi-view Stereo Reconstruction in Remote Sensing