DSEM-NeRF: Multimodal feature fusion and global-local attention for enhanced 3D scene reconstruction

Dong Liu,Zhiyong Wang,Peiyuan Chen
DOI: https://doi.org/10.1016/j.inffus.2024.102752
IF: 18.6
2024-10-25
Information Fusion
Abstract:3D scene understanding often faces the problems of insufficient detail capture and poor adaptability to multi-view changes. To this end, we proposed a NeRF-based 3D scene understanding model DSEM-NeRF, which effectively improves the reconstruction quality of complex scenes through multimodal feature fusion and global-local attention mechanism. DSEM-NeRF extracts multimodal features such as color, depth, and semantics from multi-view 2D images, and accurately captures key areas by dynamically adjusting the importance of features. Experimental results show that DSEM-NeRF outperforms many existing models on the LLFF and DTU datasets, with PSNR reaching 20.01, 23.56, and 24.58 respectively, and SSIM reaching 0.834. In particular, it shows strong robustness in complex scenes and multi-view changes, verifying the effectiveness and reliability of the model.
computer science, artificial intelligence, theory & methods
What problem does this paper attempt to address?