GeoTransfer : Generalizable Few-Shot Multi-View Reconstruction via Transfer Learning

Shubhendu Jena,Franck Multon,Adnane Boukhayma
2024-09-29
Abstract:This paper presents a novel approach for sparse 3D reconstruction by leveraging the expressive power of Neural Radiance Fields (NeRFs) and fast transfer of their features to learn accurate occupancy fields. Existing 3D reconstruction methods from sparse inputs still struggle with capturing intricate geometric details and can suffer from limitations in handling occluded regions. On the other hand, NeRFs excel in modeling complex scenes but do not offer means to extract meaningful geometry. Our proposed method offers the best of both worlds by transferring the information encoded in NeRF features to derive an accurate occupancy field representation. We utilize a pre-trained, generalizable state-of-the-art NeRF network to capture detailed scene radiance information, and rapidly transfer this knowledge to train a generalizable implicit occupancy network. This process helps in leveraging the knowledge of the scene geometry encoded in the generalizable NeRF prior and refining it to learn occupancy fields, facilitating a more precise generalizable representation of 3D space. The transfer learning approach leads to a dramatic reduction in training time, by orders of magnitude (i.e. from several days to 3.5 hrs), obviating the need to train generalizable sparse surface reconstruction methods from scratch. Additionally, we introduce a novel loss on volumetric rendering weights that helps in the learning of accurate occupancy fields, along with a normal loss that helps in global smoothing of the occupancy fields. We evaluate our approach on the DTU dataset and demonstrate state-of-the-art performance in terms of reconstruction accuracy, especially in challenging scenarios with sparse input data and occluded regions. We furthermore demonstrate the generalization capabilities of our method by showing qualitative results on the Blended MVS dataset without any retraining.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: high - quality 3D reconstruction from sparse multi - view images. Existing 3D reconstruction methods have difficulty in capturing complex geometric details when dealing with sparse inputs and have limitations in dealing with occluded areas. Meanwhile, although NeRF (Neural Radiance Field) performs excellently in modeling complex scenes, they cannot directly extract meaningful geometric information. To solve these problems, this paper proposes a new method, that is, through transfer learning, quickly transfer the features in the pre - trained general NeRF model (such as GeoNeRF) to the implicit occupancy network to obtain an accurate occupancy field representation. Specifically, this method uses the pre - trained general NeRF model to capture detailed scene radiation information and quickly transfers this knowledge to the training of the general implicit occupancy network. This not only improves the reconstruction accuracy but also greatly shortens the training time, from several days to several hours. ### Main contributions: 1. **Fast adaptation and transfer**: Through transfer learning, quickly transform the existing state - of - the - art general NeRF model (such as GeoNeRF) into a general occupancy network, thereby achieving 3D structure reconstruction from sparse views. 2. **Significantly reduce training time**: The training time is reduced from several days to several hours while maintaining the state - of - the - art reconstruction performance. 3. **Introduce novel loss functions**: Propose a loss function based on volume rendering weights and a surface normal smoothness loss function, further optimizing the learning of the occupancy field and improving the reconstruction accuracy. ### Core ideas of the solution: - **Transfer learning**: Utilize the pre - trained general NeRF model (such as GeoNeRF) and learn the occupancy field by fine - tuning its features. - **Occupancy field prediction**: Predict the occupancy field by introducing a new Sigmoid - activated implicit decoder \( f_o \). - **Loss function design**: Include volume rendering loss, density loss, occupancy distillation loss, etc., to ensure that the model can accurately learn the occupancy field and generate high - quality 3D reconstruction results. This method not only shows the state - of - the - art reconstruction accuracy on the DTU dataset but also exhibits good generalization ability on the BlendedMVS dataset without any fine - tuning.