Abstract:Extensive studies have been conducted on multi-view stereo and stereo matching for 3D reconstruction, whereas relatively few methods have been proposed for a large-scale environment. The difficulty of producing high-resolution depth/disparity maps is one of the main reasons. In this paper, we propose a dual attention-guided self-adaptive aware cascade network (DAscNet) that achieves state-of-the-art results for generating high-resolution depth/disparity maps of complex scenes by introducing a cascade inference strategy using a set of input views. A pyramid cost volume fusion and a self-adaptive cost volume cascade are built upon a dual attention-guided context multi-scale feature extraction encoding geometric, spatial and contextual information at gradually finer scales to achieve robust structural representation for predictions. The dual attention-guided context multi-scale feature extraction is made up of two distinct modules that are both based on the attention mechanism. In the pyramid cost volume fusion, an inter-cost attention aggregation module fuses multiple low-resolution dense cost volumes to achieve a robust structural representation for initial predictions. In the self-adaptive cost volume cascade, a changeable depth/disparity range estimation module is employed to alter the depth/disparity searching range interval of following stage based on the prediction information from the previous stage. This module can drive the network to gradually deal with complicated matching ambiguities and make better the accuracy of depth/disparity searching range interval prediction. Experiments on two publicly available datasets, the Tanks and Temples dataset and the DTU dataset, show that DAscNet outperforms prior work. The effectiveness of our proposed method is also supported by statistics on the accuracy, runtime, and GPU memory of other representative methods.

A Cascade Network with Adaptive Depth Hypotheses Estimation for Multi-View Stereo and Image Three-Dimensional Reconstruction

HC-MVSNet: A Probability Sampling-Based Multi-View-stereo Network with Hybrid Cascade Structure for 3D Reconstruction

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

DRI-MVSNet: A Depth Residual Inference Network for Multi-View Stereo Images

Adaptive aggregation and depth refinement multi-view stereo network

Multi-View Stereo Representation Revist: Region-Aware MVSNet

Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for 3D Reconstruction

Hybrid-MVS: Robust Multi-View Reconstruction with Hybrid Optimization of Visual and Depth Cues

Dual Attention-Guided Self-Adaptive Aware Cascade Network for Multi-View Stereo and Stereo Matching

OD-MVSNet: Omni-dimensional dynamic multi-view stereo network

Unsupervised multi-view stereo network based on multi-stage depth estimation

EPP-MVSNet: Epipolar-assembling based Depth Prediction for Multi-view Stereo

Attention-guided Multi-view Stereo Network for Depth Estimation

Hierarchical MVSNet with Cost Volume Separation and Fusion Based on U-shape Feature Extraction

Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching

Deep Stereo using Adaptive Thin Volume Representation with Uncertainty Awareness

Recurrent Multi-view Stereo Depth Inference with Pyramid of Images

Adaptive Cost Aggregation in Iterative Depth Estimation for Efficient Multi-view Stereo.

Multi-view depth estimation based on multi-feature aggregation for 3D reconstruction

Adaptive Depth Estimation for Pyramid Multi-View Stereo.

3DVNet: Multi-View Depth Prediction and Volumetric Refinement