Abstract:Due to the presence of regions with weak textures or non-Lambertian surfaces, feature matching in learning-based Multi-View Stereo (MVS) algorithms often leads to incorrect matches, resulting in the construction of the flawed cost volume and incomplete scene reconstruction. In response to this limitation, this paper introduces the MVS network based on attention mechanism and neural volume rendering. Firstly, we employ a multi-scale feature extraction module based on dilated convolution and attention mechanisms. This module enables the network to accurately model inter-pixel dependencies, focusing on crucial information for robust feature matching. Secondly, to mitigate the impact of the flawed cost volume, we establish a neural volume rendering network based on multi-view semantic features and neural encoding volume. By introducing the rendering reference view loss, we infer 3D geometric scenes, enabling the network to learn scene geometry information beyond the cost volume representation. Additionally, we apply the depth consistency loss to maintain geometric consistency across networks. The experimental results indicate that on the DTU dataset, compared to the CasMVSNet method, the completeness of reconstructions improved by 23.1%, and the Overall increased by 7.3%. On the intermediate subset of the Tanks and Temples dataset, the average F-score for reconstructions is 58.00, which outperforms other networks, demonstrating superior reconstruction performance and strong generalization capability.

Self-Supervised Multi-view Stereo Via Adjacent Geometry Guided Volume Completion

Self-supervised Multi-view Stereo Via Inter and Intra Network Pseudo Depth

Multi-View Stereo Representation Revist: Region-Aware MVSNet

Geometric Prior-Guided Self-Supervised Learning for Multi-View Stereo

Self-supervised Multi-view Stereo Via Effective Co-Segmentation and Data-Augmentation.

Self-Constructing Stereo Correspondences for Unsupervised Multi-View Stereo

GeoMVSNet: Learning Multi-View Stereo with Geometry Perception

Self-supervised Multi-view Stereo Via View Synthesis Representation Consistency

Sparse Prior Guided Deep Multi-View Stereo

Geometry-Enhanced Attentive Multi-View Stereo for Challenging Matching Scenarios

Unsupervised multi-view stereo network based on multi-stage depth estimation

Self-Supervised Multi-View Stereo with Adaptive Depth Priors

High-Quality Depth Recovery Via Interactive Multi-view Stereo

Dense Multiview Stereo Based On Image Texture Enhancement

A contrastive learning based unsupervised multi-view stereo with multi-stage self-training strategy

Learning Unsupervised Multi-View Stereopsis via Robust Photometric Consistency

Multi-View Stereo Network Based on Attention Mechanism and Neural Volume Rendering

Focusing on Cross Views Improves Reconstruction in Unsupervised Multi-View Stereo

Two-stage Self-supervised MVS Network Using Adaptive Depth Sampling.

Confidence-Based Large-Scale Dense Multi-View Stereo

Confidence-Guided Planar-Recovering Multiview Stereo for Weakly Textured Plane of High-Resolution Image Scenes