Enhanced multi view 3D reconstruction with improved MVSNet

Guangchen Li,Kefeng Li,Guangyuan Zhang,Zhenfang Zhu,Peng Wang,Zhenfei Wang,Chen Fu
DOI: https://doi.org/10.1038/s41598-024-64805-y
IF: 4.6
2024-06-21
Scientific Reports
Abstract:Although 3D reconstruction has been widely used in many fields as a key component of environment perception, existing technologies still have the potential for further improvement in 3D scene reconstruction. We propose an improved reconstruction algorithm based on the MVSNet network architecture. To glean richer pixel details from images, we suggest deploying a DE module integrated with a residual framework, which supplants the prevailing feature extraction mechanism. The DE module uses ECA-Net and dilated convolution to expand the receptive field range, performing feature splicing and fusion through the residual structure to retain the global information of the original image. Moreover, harnessing attention mechanisms refines the 3D cost volume's regularization process, bolstering the integration of information across multi-scale feature volumes, consequently enhancing depth estimation precision. When assessed our model using the DTU dataset, findings highlight the network's 3D reconstruction scoring a completeness (comp) of 0.411 mm and an overall quality of 0.418 mm. This performance is higher than that of traditional methods and other deep learning-based methods. Additionally, the visual representation of the point cloud model exhibits marked advancements. Trials on the Blended MVS dataset signify that our network exhibits commendable generalization prowess.
multidisciplinary sciences
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve The paper primarily aims to improve Multi-View Stereo (MVS) technology, especially when dealing with sparse textures and non-Lambertian scenes, where existing 3D reconstruction methods have certain limitations. Specifically, the paper proposes an improved algorithm based on the MVSNet network architecture—DEC-MVSNet, with the following objectives: 1. **Enhance Feature Extraction Capability**: By introducing the DE module (combining dilated convolution and ECA-Net attention mechanism), it expands the receptive field range and enriches the image feature extraction. 2. **Optimize Cost Volume Regularization Process**: Utilizing the CBAM attention mechanism, it improves the accuracy of depth estimation, thereby enhancing the overall quality of the 3D reconstruction results. 3. **Improve Model Performance**: Experimental results on the DTU dataset show that the improved model significantly outperforms traditional MVSNet and other deep learning methods in terms of completeness and overall quality. In summary, the main purpose of this paper is to propose a new network structure, DEC-MVSNet, to enhance the robustness and accuracy of multi-view 3D reconstruction technology in complex scenes.