Abstract:Neural Radiance Field (NeRF) is a representation for 3D reconstruction from multi-view images. Despite some recent work showing preliminary success in editing a reconstructed NeRF with diffusion prior, they remain struggling to synthesize reasonable geometry in completely uncovered regions. One major reason is the high diversity of synthetic contents from the diffusion model, which hinders the radiance field from converging to a crisp and deterministic geometry. Moreover, applying latent diffusion models on real data often yields a textural shift incoherent to the image condition due to auto-encoding errors. These two problems are further reinforced with the use of pixel-distance losses. To address these issues, we propose tempering the diffusion model's stochasticity with per-scene customization and mitigating the textural shift with masked adversarial training. During the analyses, we also found the commonly used pixel and perceptual losses are harmful in the NeRF inpainting task. Through rigorous experiments, our framework yields state-of-the-art NeRF inpainting results on various real-world scenes. Project page:

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the deficiencies of Neural Radiance Field (NeRF) in inpainting. Specifically, although existing methods can achieve high - quality 3D reconstruction and novel view synthesis in multi - view images, they still face challenges in generating reasonable geometric structures in completely uncovered areas. In addition, when applying the latent diffusion model (LDM) for 2D image inpainting, texture shift usually occurs due to auto - encoding errors, thus introducing obvious artifacts in the finally inpainted NeRF. To solve these problems, the authors propose the following improvement measures: 1. **Reduce the randomness of the diffusion model**: Customize the adjustment for each scene to make the diffusion model more in line with the characteristics of a specific scene. 2. **Alleviate texture shift**: Use masked adversarial training to hide the inpainting boundaries and prevent the discriminator from using these boundaries to identify real image patches, thereby reducing the texture difference between the inpainting area and the reconstruction area. 3. **Optimize the design of the loss function**: It has been found that the commonly used pixel - level and perceptual losses are harmful to the NeRF inpainting task. Therefore, a new combination of loss functions is proposed, including adversarial loss and feature - matching loss. Through these improvements, the method proposed in this paper (MALD - NeRF) achieves state - of - the - art NeRF inpainting effects on multiple real - scene datasets, especially in terms of high - frequency detail preservation and texture consistency. ### Specific Problem Summary - **Problem Background**: NeRF has the problem of unreasonable geometric structures when inpainting completely uncovered areas, and texture shift is likely to occur when using LDM for 2D image inpainting. - **Solutions**: - Use masked adversarial training to reduce the texture difference between the inpainting area and the reconstruction area. - Reduce the randomness of the diffusion model by customizing the adjustment for each scene. - Design a new combination of loss functions to avoid the negative impacts brought by the commonly used pixel - level and perceptual losses. - **Experimental Results**: MALD - NeRF achieves better results than existing methods on multiple datasets, especially in terms of visual quality and quantitative evaluation metrics such as FID and KID. ### Mathematical Formula Representation - **Adversarial Loss**: \[ L_{\text{adv}} = f(D(C_m(x_m))) + f(-D(C_r(\hat{x}_r))) \] where \( f(x) = -\log(1 + \exp(-x)) \), \( C_m \) and \( C_r \) are the mask functions of the inpainting area and the non - inpainting area respectively, and \( D \) is the discriminator. - **Discriminator Feature Matching Loss**: \[ L_{\text{fm}} = \| F(C_m(x_m)) - F(C_m(\hat{x}_m)) \|_1 \] where \( F \) is the feature extracted from the intermediate layer of the discriminator. These improvements make MALD - NeRF more robust and efficient in handling complex NeRF inpainting tasks.

Taming Latent Diffusion Model for Neural Radiance Field Inpainting

Single-Mask Inpainting for Voxel-Based Neural Radiance Fields

Inpaint4DNeRF: Promptable Spatio-Temporal NeRF Inpainting with Generative Diffusion Models

RePaint-NeRF: NeRF Editting via Semantic Masks and Diffusion Models

Edit-DiffNeRF: Editing 3D Neural Radiance Fields using 2D Diffusion Model

Drantal-NeRF: Diffusion-Based Restoration for Anti-aliasing Neural Radiance Field

IE-NeRF: Inpainting Enhanced Neural Radiance Fields in the Wild

NeRF-In: Free-Form NeRF Inpainting with RGB-D Priors

NeRF-In: Free-Form Inpainting for Pretrained NeRF With RGB-D Priors

DiffusioNeRF: Regularizing Neural Radiance Fields with Denoising Diffusion Models

ExtraNeRF: Visibility-Aware View Extrapolation of Neural Radiance Fields with Diffusion Models

DATENeRF: Depth-Aware Text-based Editing of NeRFs

Three-Dimensional-Consistent Scene Inpainting via Uncertainty-Aware Neural Radiance Field

Where and How: Mitigating Confusion in Neural Radiance Fields from Sparse Inputs

IntrinsicNeRF: Learning Intrinsic Neural Radiance Fields for Editable Novel View Synthesis

Enhancing Neural Radiance Fields with Depth and Normal Completion Priors from Sparse Views

Adaptive Multi-NeRF: Exploit Efficient Parallelism in Adaptive Multiple Scale Neural Radiance Field Rendering

Reconstructive Latent-Space Neural Radiance Fields for Efficient 3D Scene Representations

NeRF Inpainting with Geometric Diffusion Prior and Balanced Score Distillation

Enhance-NeRF: Multiple Performance Evaluation for Neural Radiance Fields

Learning a Diffusion Prior for NeRFs