FA-MSVNet: multi-scale and multi-view feature aggregation methods for stereo 3D reconstruction
Yao Li,Yong Zhou,Jiaqi Zhao,Wen-Liang Du,Rui Yao
DOI: https://doi.org/10.1007/s11042-024-20431-4
IF: 2.577
2024-12-04
Multimedia Tools and Applications
Abstract:Stereo 3D reconstruction is pivotal in computer vision, requiring effective fusion of local detail and global semantic features. This paper introduces a novel multi-scale, multi-view cascading network that enhances both accuracy and completeness in 3D reconstruction. In this paper, we addresse the critical challenge of fusing local detail and global semantic features across multiple views. Firstly, the Multi-Scale Feature Aggregation Module (SFAM) is proposed which integrates features at various scales within each view to capture fine-grained details and improve point cloud resolution. Secondly, the Feature Enhancement Module (FEM) is designed to enhance features representation by aggregating spatial context, improving point cloud boundary integrity and reducing occlusion artifacts. Thirdly, the Multi-View Feature Aggregation Module (VFAM) utilizes rectified linear attention to merge global contextual information across views, ensuring more coherent and semantically accurate reconstructions. Extensive experiments on three benchmark datasets resulted in experimental improvements of 3.1%, 8.0%, and 3.1%, respectively, demonstrating that our approach achieves competitive results.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering