Abstract:We propose Gumbel-NeRF, a mixture-of-expert (MoE) neural radiance fields (NeRF) model with a hindsight expert selection mechanism for synthesizing novel views of unseen objects. Previous studies have shown that the MoE structure provides high-quality representations of a given large-scale scene consisting of many objects. However, we observe that such a MoE NeRF model often produces low-quality representations in the vicinity of experts' boundaries when applied to the task of novel view synthesis of an unseen object from one/few-shot input. We find that this deterioration is primarily caused by the foresight expert selection mechanism, which may leave an unnatural discontinuity in the object shape near the experts' boundaries. Gumbel-NeRF adopts a hindsight expert selection mechanism, which guarantees continuity in the density field even near the experts' boundaries. Experiments using the SRN cars dataset demonstrate the superiority of Gumbel-NeRF over the baselines in terms of various image quality metrics.
What problem does this paper attempt to address?
### What problem does this paper attempt to solve?
This paper aims to solve the problem of constructing high - quality 3D representations of unseen car instances from a small number (1 to a few) of 2D observation images. Specifically, the authors propose the Gumbel - NeRF model for synthesizing new - view images of unseen objects, especially when there are only 1 to a few input images.
#### Background and problem description
1. **Limitations of existing methods**:
- **NeRF (Neural Radiance Field)**: Although NeRF performs excellently in view synthesis for a single scene, it requires a large number of training images and is individually optimized for each scene, which has limitations when dealing with unseen instances.
- **CodeNeRF**: It extends NeRF by introducing instance - specific latent codes to handle multiple unseen instances, but its global latent code sometimes has difficulty fully representing semantic categories with diverse shapes and appearances.
- **Switch - NeRF**: It uses a Mixture of Experts (MoE) structure to better represent large - scale scenes, but in the new - instance view - synthesis task, especially in the boundary area, it will produce unnatural discontinuities.
2. **Specific problems**:
- When applied to the new - view - synthesis task of unseen objects, existing MoE NeRF models tend to produce low - quality representations near the expert boundaries, resulting in discontinuities in the object shape and visual unnaturalness.
- This discontinuity is mainly caused by the foresight expert selection mechanism, that is, selecting experts before processing them, which may miss the best expert and thus affect the rendering quality.
#### Proposed solutions
To solve the above problems, the authors propose Gumbel - NeRF, whose main features include:
1. **Hindsight Expert Selection Mechanism**:
- Gumbel - NeRF adopts a density - based hindsight selection mechanism to ensure the continuity of the density field even near the expert boundaries. Specifically, it selects the expert with the highest density through max - pooling, instead of making the selection before processing the experts.
2. **Part - Specific Experts**:
- Each expert is associated with part - specific latent codes that represent different parts of a given car (such as wheels, rooftops, doors, etc.). In this way, experts can specifically learn to model the corresponding object parts without explicit supervision.
3. **Rival - to - Expert Training Strategy**:
- By controlling the temperature parameter τ, a higher randomness is introduced in the early training stage, so that all experts have the opportunity to be selected and obtain gradient updates. As the training progresses, the temperature is gradually decreased, making the experts more stable and specialized, avoiding the routing - collapse problem.
#### Experimental results
Through experiments on the ShapeNet - SRN car dataset, Gumbel - NeRF outperforms baseline methods (such as CodeNeRF and Coded Switch - NeRF) in multiple image - quality metrics, proving its superior adaptability in capturing the details of unseen instances.
In summary, this paper solves the problems of discontinuity and low - quality representation in the new - instance view - synthesis task of existing methods by improving the expert - selection mechanism and introducing part - specific experts, and improves the quality and continuity of the synthesized images.