MVIP-NeRF: Multi-view 3D Inpainting on NeRF Scenes via Diffusion Prior

Honghua Chen,Chen Change Loy,Xingang Pan
2024-05-05
Abstract:Despite the emergence of successful NeRF inpainting methods built upon explicit RGB and depth 2D inpainting supervisions, these methods are inherently constrained by the capabilities of their underlying 2D inpainters. This is due to two key reasons: (i) independently inpainting constituent images results in view-inconsistent imagery, and (ii) 2D inpainters struggle to ensure high-quality geometry completion and alignment with inpainted RGB images.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is multi - view 3D inpainting in Neural Radiance Field (NeRF) scenes. Specifically, when dealing with NeRF scene inpainting, existing methods mainly rely on explicit RGB and depth inpainting priors. These priors usually inpaint each view independently through 2D image inpainters and then attempt to solve multi - view inconsistency. However, this method has several limitations: 1. **View Inconsistency**: Independently inpainting each constituent image will lead to view - inconsistent images, especially when the change in viewing angles is significant. 2. **Geometric Completeness and Alignment Problems**: 2D image inpainters have difficulty ensuring high - quality geometric completion and alignment with the inpainted RGB images. To overcome these problems, the paper proposes a new method - MVIP - NeRF (Multi - view 3D Inpainting on NeRF Scenes via Diffusion Prior), which utilizes the potential of diffusion models to achieve inpainting of NeRF scenes while focusing on both appearance and geometric aspects. The main features of MVIP - NeRF include: - **Joint Multi - view Inpainting**: Through an iterative optimization process based on Score Distillation Sampling (SDS), joint inpainting is carried out on multiple views to achieve a consistent solution. - **Geometric Representation**: In addition to restoring the rendered RGB images, normal maps are also extracted as geometric representations, and a normal SDS loss is defined to encourage accurate geometric inpainting and alignment with the appearance. - **Multi - view Score Function**: To handle large changes in viewing angles, a multi - view score function is proposed to ensure that generative priors are simultaneously extracted from different - view images, thereby ensuring consistent visual completion when dealing with large - angle - of - view changes. Through these innovations, MVIP - NeRF can achieve more consistent and more realistic inpainting results under multi - view conditions without explicit RGB or depth supervision.