Abstract:Neural Radiance Fields (NeRF) have achieved huge success in effectively capturing and representing 3D objects and scenes. However, to establish a ubiquitous presence in everyday media formats, such as images and videos, we need to fulfill three key objectives: 1. fast encoding and decoding time, 2. compact model sizes, and 3. high-quality renderings. Despite recent advancements, a comprehensive algorithm that adequately addresses all objectives has yet to be fully realized. In this work, we present CodecNeRF, a neural codec for NeRF representations, consisting of an encoder and decoder architecture that can generate a NeRF representation in a single forward pass. Furthermore, inspired by the recent parameter-efficient finetuning approaches, we propose a finetuning method to efficiently adapt the generated NeRF representations to a new test instance, leading to high-quality image renderings and compact code sizes. The proposed CodecNeRF, a newly suggested encoding-decoding-finetuning pipeline for NeRF, achieved unprecedented compression performance of more than 100x and remarkable reduction in encoding time while maintaining (or improving) the image quality on widely used 3D object datasets.
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are the three major challenges faced by the current NeRF (Neural Radiance Fields) technology in practical applications: fast encoding and decoding time, compact model size, and high - quality rendering effects. Although NeRF has achieved great success in capturing and representing 3D objects and scenes, in order to make it a part of everyday media formats (such as images and videos), the above three key goals still need to be overcome. Current technological progress has not yet fully met these requirements.
Specifically, the paper proposes CodecNeRF, a new neural codec architecture, which aims to generate NeRF representations through a single forward pass, thereby achieving fast encoding and decoding, model compactness, and high - quality view synthesis. In addition, the paper also introduces a parameter - efficient fine - tuning method to adapt to new test instances, further improving the image - rendering quality and the compactness of the code.
### Main Contributions
1. **Propose CodecNeRF**: An encoding - decoding - fine - tuning pipeline for NeRF representation.
2. **Design a 3D - aware encoding - decoding architecture**: Efficiently aggregate multi - view images, generate compact codes, and generate NeRF representations from the codes.
3. **Parameter - efficient fine - tuning method**: Further fine - tune the NeRF representation consisting of MLP and feature planes.
4. **Achieve unprecedented compression ratios and encoding acceleration**: Achieve efficient compression and fast encoding of NeRF while maintaining high - quality rendering.
### Technical Details
- **Overall Architecture**: The encoder and decoder architectures of CodecNeRF are shown in Figure 1. The input is N images from different perspectives, which pass through the 2D image feature extraction module, the projection and aggregation module, and finally generate 3D features. These 3D features generate three 2D features through axis - aligned average pooling, and then generate multi - resolution triplanes through the triplane module.
- **3D Feature Compression**: Use the explicit - implicit hybrid NeRF representation method (triplane) to decompose the 3D volume into three 2D planes to reduce the number of bits required for storage and transmission.
- **Multi - resolution Triplanes**: Introduce hierarchical 3D - aware convolution blocks to generate multi - resolution triplanes, promoting spatial smoothness at different scales and better convergence.
- **Training Objectives**: Use L2 loss, LPIPS loss, and total variation regularization to optimize the model.
- **Parameter - efficient Fine - tuning**: Use the low - rank adaptation (LoRA) method to fine - tune the triplane and MLP, significantly reducing the trainable parameters required for optimization at test time.
- **Entropy - coding Fine - tuning Increments**: Use the neural compression method to entropy - code the fine - tuning increments, further compressing the model.
### Experimental Results
- **Datasets**: Experiments were carried out on the Objaverse, Google Scanned Objects (GSO), and DTU datasets.
- **Quantitative Evaluation**: Table 1 shows the quantitative results on the Objaverse and GSO datasets. CodecNeRF significantly outperforms the baseline model in metrics such as PSNR, SSIM, and MS - SSIM, and achieves a 100 - fold compression in storage requirements.
- **Qualitative Evaluation**: Figure 3 shows the new - view - synthesis results on the Objaverse dataset, demonstrating the fast - encoding ability and high - quality rendering effect of CodecNeRF.
In conclusion, CodecNeRF successfully solves the key challenges of NeRF in practical applications through an innovative encoding - decoding - fine - tuning pipeline, providing a new solution for the efficient processing of 3D media.