Abstract:This paper presents a novel approach for sparse 3D reconstruction by leveraging the expressive power of Neural Radiance Fields (NeRFs) and fast transfer of their features to learn accurate occupancy fields. Existing 3D reconstruction methods from sparse inputs still struggle with capturing intricate geometric details and can suffer from limitations in handling occluded regions. On the other hand, NeRFs excel in modeling complex scenes but do not offer means to extract meaningful geometry. Our proposed method offers the best of both worlds by transferring the information encoded in NeRF features to derive an accurate occupancy field representation. We utilize a pre-trained, generalizable state-of-the-art NeRF network to capture detailed scene radiance information, and rapidly transfer this knowledge to train a generalizable implicit occupancy network. This process helps in leveraging the knowledge of the scene geometry encoded in the generalizable NeRF prior and refining it to learn occupancy fields, facilitating a more precise generalizable representation of 3D space. The transfer learning approach leads to a dramatic reduction in training time, by orders of magnitude (i.e. from several days to 3.5 hrs), obviating the need to train generalizable sparse surface reconstruction methods from scratch. Additionally, we introduce a novel loss on volumetric rendering weights that helps in the learning of accurate occupancy fields, along with a normal loss that helps in global smoothing of the occupancy fields. We evaluate our approach on the DTU dataset and demonstrate state-of-the-art performance in terms of reconstruction accuracy, especially in challenging scenarios with sparse input data and occluded regions. We furthermore demonstrate the generalization capabilities of our method by showing qualitative results on the Blended MVS dataset without any retraining.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: high - quality 3D reconstruction from sparse multi - view images. Existing 3D reconstruction methods have difficulty in capturing complex geometric details when dealing with sparse inputs and have limitations in dealing with occluded areas. Meanwhile, although NeRF (Neural Radiance Field) performs excellently in modeling complex scenes, they cannot directly extract meaningful geometric information. To solve these problems, this paper proposes a new method, that is, through transfer learning, quickly transfer the features in the pre - trained general NeRF model (such as GeoNeRF) to the implicit occupancy network to obtain an accurate occupancy field representation. Specifically, this method uses the pre - trained general NeRF model to capture detailed scene radiation information and quickly transfers this knowledge to the training of the general implicit occupancy network. This not only improves the reconstruction accuracy but also greatly shortens the training time, from several days to several hours. ### Main contributions: 1. **Fast adaptation and transfer**: Through transfer learning, quickly transform the existing state - of - the - art general NeRF model (such as GeoNeRF) into a general occupancy network, thereby achieving 3D structure reconstruction from sparse views. 2. **Significantly reduce training time**: The training time is reduced from several days to several hours while maintaining the state - of - the - art reconstruction performance. 3. **Introduce novel loss functions**: Propose a loss function based on volume rendering weights and a surface normal smoothness loss function, further optimizing the learning of the occupancy field and improving the reconstruction accuracy. ### Core ideas of the solution: - **Transfer learning**: Utilize the pre - trained general NeRF model (such as GeoNeRF) and learn the occupancy field by fine - tuning its features. - **Occupancy field prediction**: Predict the occupancy field by introducing a new Sigmoid - activated implicit decoder \( f_o \). - **Loss function design**: Include volume rendering loss, density loss, occupancy distillation loss, etc., to ensure that the model can accurately learn the occupancy field and generate high - quality 3D reconstruction results. This method not only shows the state - of - the - art reconstruction accuracy on the DTU dataset but also exhibits good generalization ability on the BlendedMVS dataset without any fine - tuning.

GeoTransfer : Generalizable Few-Shot Multi-View Reconstruction via Transfer Learning

MPS-NeRF: Generalizable 3D Human Rendering from Multiview Images

MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo

GeoNeRF: Generalizing NeRF with Geometry Priors

Omni-Recon: Harnessing Image-based Rendering for General-Purpose Neural Radiance Fields

Geometry-aware Reconstruction and Fusion-refined Rendering for Generalizable Neural Radiance Fields

NeuralTO: Neural Reconstruction and View Synthesis of Translucent Objects

Learning Robust Generalizable Radiance Field with Visibility and Feature Augmented Point Representation

Res-NeuS: Deep Residuals and Neural Implicit Surface Learning for Multi-View Reconstruction

VRS-NeRF: Visual Relocalization with Sparse Neural Radiance Field

Cascaded and Generalizable Neural Radiance Fields for Fast View Synthesis

UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction

OmniNeRF: Hybriding Omnidirectional Distance and Radiance fields for Neural Surface Reconstruction

TomoGRAF: A Robust and Generalizable Reconstruction Network for Single-View Computed Tomography

Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks

Georecon: a coarse-to-fine visual 3D reconstruction approach for high-resolution images with neural matching priors

Explicit Correspondence Matching for Generalizable Neural Radiance Fields

ReconFusion: 3D Reconstruction with Diffusion Priors

SGCNeRF: Few-Shot Neural Rendering via Sparse Geometric Consistency Guidance

Generative Lifting of Multiview to 3D from Unknown Pose: Wrapping NeRF inside Diffusion

3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction