Abstract:Extensive studies have been conducted on multi-view stereo and stereo matching for 3D reconstruction, whereas relatively few methods have been proposed for a large-scale environment. The difficulty of producing high-resolution depth/disparity maps is one of the main reasons. In this paper, we propose a dual attention-guided self-adaptive aware cascade network (DAscNet) that achieves state-of-the-art results for generating high-resolution depth/disparity maps of complex scenes by introducing a cascade inference strategy using a set of input views. A pyramid cost volume fusion and a self-adaptive cost volume cascade are built upon a dual attention-guided context multi-scale feature extraction encoding geometric, spatial and contextual information at gradually finer scales to achieve robust structural representation for predictions. The dual attention-guided context multi-scale feature extraction is made up of two distinct modules that are both based on the attention mechanism. In the pyramid cost volume fusion, an inter-cost attention aggregation module fuses multiple low-resolution dense cost volumes to achieve a robust structural representation for initial predictions. In the self-adaptive cost volume cascade, a changeable depth/disparity range estimation module is employed to alter the depth/disparity searching range interval of following stage based on the prediction information from the previous stage. This module can drive the network to gradually deal with complicated matching ambiguities and make better the accuracy of depth/disparity searching range interval prediction. Experiments on two publicly available datasets, the Tanks and Temples dataset and the DTU dataset, show that DAscNet outperforms prior work. The effectiveness of our proposed method is also supported by statistics on the accuracy, runtime, and GPU memory of other representative methods.

SCSCN: A Separated Channel-Spatial Convolution Net with Attention for Single-View Reconstruction.

Convolutional Neural Network Based Computational Imaging Spectroscopy

SST: Real-time End-to-end Monocular 3D Reconstruction via Sparse Spatial-Temporal Guidance

3D Former: Monocular Scene Reconstruction with 3D SDF Transformers

3D VAE-Attention Network: A Parallel System for Single-view 3D Reconstruction.

What Do Single-view 3D Reconstruction Networks Learn?

Object Reconstruction Based on Attentive Recurrent Network from Single and Multiple Images

Scanet: Spatial-Channel Attention Network For 3d Object Detection

Single-view 3D Reconstruction Algorithm Based on View-aware

PA-MVSNet: Sparse-to-Dense Multi-View Stereo With Pyramid Attention

Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for 3D Reconstruction

Enhanced multi view 3D reconstruction with improved MVSNet

Single-view 3D reconstruction via dual attention

Single View 3D Reconstruction with Category Information Learning

Vanet - a View Attention Guided Network for 3d Reconstruction from Single and Multi-View Images.

3D Reconstruction for Multi-view Objects

A Single Stage and Single View 3D Point Cloud Reconstruction Network Based on DetNet

Semantic Based Autoencoder-Attention 3D Reconstruction Network.

NeuralRecon: Real-Time Coherent 3D Scene Reconstruction from Monocular Video

Dual Attention-Guided Self-Adaptive Aware Cascade Network for Multi-View Stereo and Stereo Matching

High-Quality Textured 3D Shape Reconstruction with Cascaded Fully Convolutional Networks