MM-NeRF: Large-Scale Scene Representation with Multi-Resolution Hash Grid and Multi-View Priors Features

Bo Dong,Kaiqiang Chen,Zhirui Wang,Menglong Yan,Jiaojiao Gu,Xian Sun
DOI: https://doi.org/10.3390/electronics13050844
IF: 2.9
2024-02-23
Electronics
Abstract:Reconstructing large-scale scenes using Neural Radiance Fields (NeRFs) is a research hotspot in 3D computer vision. Existing MLP (multi-layer perception)-based methods often suffer from issues of underfitting and a lack of fine details in rendering large-scale scenes. Popular solutions are to divide the scene into small areas for separate modeling or to increase the layer scale of the MLP network. However, the subsequent problem is that the training cost increases. Moreover, reconstructing large scenes, unlike object-scale reconstruction, involves a geometrically considerable increase in the quantity of view data if the prior information of the scene is not effectively utilized. In this paper, we propose an innovative method named MM-NeRF, which integrates efficient hybrid features into the NeRF framework to enhance the reconstruction of large-scale scenes. We propose employing a dual-branch feature capture structure, comprising a multi-resolution 3D hash grid feature branch and a multi-view 2D prior feature branch. The 3D hash grid feature models geometric details, while the 2D prior feature supplements local texture information. Our experimental results show that such integration is sufficient to render realistic novel views with fine details, forming a more accurate geometric representation. Compared with representative methods in the field, our method significantly improves the PSNR (Peak Signal-to-Noise Ratio) by approximately 5%. This remarkable progress underscores the outstanding contribution of our method in the field of large-scene radiance field reconstruction.
engineering, electrical & electronic,computer science, information systems,physics, applied
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the problems of detail loss and large amount of view data encountered when using Neural Radiance Fields (NeRFs) for reconstruction in large - scale scenes. Specifically, existing methods based on Multi - Layer Perceptron (MLP) often have under - fitting and lack of fine details when rendering large - scale scenes. Common solutions are to divide the scene into small areas for separate modeling or increase the number of layers of the MLP network, but this will lead to an increase in training costs. In addition, unlike object - level reconstruction, if the prior information of the scene cannot be effectively utilized in the reconstruction of large - scale scenes, the amount of view data will increase significantly with the increase of geometric complexity. Therefore, the paper proposes an innovative method - MM - NeRF, which enhances the reconstruction effect of large - scale scenes by integrating efficient hybrid features into the NeRF framework. ### Main contributions of the paper: 1. **Propose a new NeRF variant**: Named MM - NeRF, specifically designed for modeling large - scale unbounded scenes. 2. **Introduce a new pipeline**: Combines 3D hash grid features and scene prior features to achieve efficient and accurate large - scene modeling. 3. **Reduce dependence on multi - view**: Achieve a good scene synthesis representation without the need for a large number of views, demonstrating the superior performance of the model. ### Method overview: - **Multi - resolution hash grid features**: Capture as many 3D details as possible through multi - resolution hash grids. - **Multi - view prior features**: Supplement missing prior information through an image encoder. - **Fusion features**: Input these features together with positional encoding (PE) into the decoder to predict the density \(\sigma\) and color value \(c\), and finally generate the image color through volume rendering. ### Key technical points: - **Multi - resolution hash grid**: Randomly map 3D points to the hash table through a hash function, support high - resolution scene representation, and at the same time avoid a large increase in the number of parameters. - **Multi - view prior feature extraction**: Use Convolutional Neural Network (CNN) and Transformer to extract features aligned across views, and form prior features by bilinear sampling by projecting 3D points onto the 2D feature plane. - **NeRF rendering network**: Combine MLP and Transformer, synthesize new views through volume rendering, and optimize the model to reduce photometric loss. ### Experimental results: - **Quantitative evaluation**: On the Google Scene Dataset and UrbanScene3D Dataset, MM - NeRF outperforms other methods in terms of PSNR, SSIM, and LPIPS metrics, especially when dealing with large - scale urban scenes. - **Qualitative evaluation**: Through comparative experiments, the advantages of MM - NeRF in detail preservation and visual quality are demonstrated, especially when dealing with complex textures and geometric structures. In conclusion, by combining multi - resolution hash grids and multi - view prior features, this paper proposes an efficient and accurate large - scale scene reconstruction method, which significantly improves the performance of NeRF in large - scale scenes.