Enhanced multi view 3D reconstruction with improved MVSNet

Guangchen Li,Kefeng Li,Guangyuan Zhang,Zhenfang Zhu,Peng Wang,Zhenfei Wang,Chen Fu

DOI: https://doi.org/10.1038/s41598-024-64805-y

IF: 4.6

2024-06-21

Scientific Reports

Abstract:Although 3D reconstruction has been widely used in many fields as a key component of environment perception, existing technologies still have the potential for further improvement in 3D scene reconstruction. We propose an improved reconstruction algorithm based on the MVSNet network architecture. To glean richer pixel details from images, we suggest deploying a DE module integrated with a residual framework, which supplants the prevailing feature extraction mechanism. The DE module uses ECA-Net and dilated convolution to expand the receptive field range, performing feature splicing and fusion through the residual structure to retain the global information of the original image. Moreover, harnessing attention mechanisms refines the 3D cost volume's regularization process, bolstering the integration of information across multi-scale feature volumes, consequently enhancing depth estimation precision. When assessed our model using the DTU dataset, findings highlight the network's 3D reconstruction scoring a completeness (comp) of 0.411 mm and an overall quality of 0.418 mm. This performance is higher than that of traditional methods and other deep learning-based methods. Additionally, the visual representation of the point cloud model exhibits marked advancements. Trials on the Blended MVS dataset signify that our network exhibits commendable generalization prowess.

multidisciplinary sciences

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve The paper primarily aims to improve Multi-View Stereo (MVS) technology, especially when dealing with sparse textures and non-Lambertian scenes, where existing 3D reconstruction methods have certain limitations. Specifically, the paper proposes an improved algorithm based on the MVSNet network architecture—DEC-MVSNet, with the following objectives: 1. **Enhance Feature Extraction Capability**: By introducing the DE module (combining dilated convolution and ECA-Net attention mechanism), it expands the receptive field range and enriches the image feature extraction. 2. **Optimize Cost Volume Regularization Process**: Utilizing the CBAM attention mechanism, it improves the accuracy of depth estimation, thereby enhancing the overall quality of the 3D reconstruction results. 3. **Improve Model Performance**: Experimental results on the DTU dataset show that the improved model significantly outperforms traditional MVSNet and other deep learning methods in terms of completeness and overall quality. In summary, the main purpose of this paper is to propose a new network structure, DEC-MVSNet, to enhance the robustness and accuracy of multi-view 3D reconstruction technology in complex scenes.

Enhanced multi view 3D reconstruction with improved MVSNet

Multi-View Stereo Representation Revist: Region-Aware MVSNet

MFE‐MVSNet: Multi‐scale feature enhancement multi‐view stereo with bi‐directional connections

Multi-View Stereo Network Based on Attention Mechanism and Neural Volume Rendering

Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for 3D Reconstruction

Multi-view depth estimation based on multi-feature aggregation for 3D reconstruction

HC-MVSNet: A Probability Sampling-Based Multi-View-stereo Network with Hybrid Cascade Structure for 3D Reconstruction

OD-MVSNet: Omni-dimensional dynamic multi-view stereo network

Hybrid-MVS: Robust Multi-View Reconstruction with Hybrid Optimization of Visual and Depth Cues

EPP-MVSNet: Epipolar-assembling based Depth Prediction for Multi-view Stereo

PA-MVSNet: Sparse-to-Dense Multi-View Stereo With Pyramid Attention

Improved Multiview Decomposition for Single-Image High-Resolution 3D Object Reconstruction

EI-MVSNet: Epipolar-Guided Multi-View Stereo Network With Interval-Aware Label

An Improved TransMVSNet Algorithm for Three-Dimensional Reconstruction in the Unmanned Aerial Vehicle Remote Sensing Domain

LNMVSNet: A Low-Noise Multi-View Stereo Depth Inference Method for 3D Reconstruction

FA-MSVNet: multi-scale and multi-view feature aggregation methods for stereo 3D reconstruction

DP-MVS: Detail Preserving Multi-View Surface Reconstruction of Large-Scale Scenes

MTD-MVSNet: Multi-view Stereo Network with Multi-scale Transformer and Dual Attention

Attention-enhanced multi-source cost volume multi-view stereo

NR-MVSNet: Learning Multi-View Stereo Based on Normal Consistency and Depth Refinement

3D Reconstruction for Multi-view Objects