InFusion: Inpainting 3D Gaussians via Learning Depth Completion from Diffusion Prior

Zhiheng Liu,Hao Ouyang,Qiuyu Wang,Ka Leong Cheng,Jie Xiao,Kai Zhu,Nan Xue,Yu Liu,Yujun Shen,Yang Cao
2024-04-18
Abstract:3D Gaussians have recently emerged as an efficient representation for novel view synthesis. This work studies its editability with a particular focus on the inpainting task, which aims to supplement an incomplete set of 3D Gaussians with additional points for visually harmonious rendering. Compared to 2D inpainting, the crux of inpainting 3D Gaussians is to figure out the rendering-relevant properties of the introduced points, whose optimization largely benefits from their initial 3D positions. To this end, we propose to guide the point initialization with an image-conditioned depth completion model, which learns to directly restore the depth map based on the observed image. Such a design allows our model to fill in depth values at an aligned scale with the original depth, and also to harness strong generalizability from largescale diffusion prior. Thanks to the more accurate depth completion, our approach, dubbed InFusion, surpasses existing alternatives with sufficiently better fidelity and efficiency under various complex scenarios. We further demonstrate the effectiveness of InFusion with several practical applications, such as inpainting with user-specific texture or with novel object insertion.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: How to effectively inpaint 3D Gaussians to supplement the missing parts in incomplete 3D scenes, so as to achieve visually harmonious rendering. Specifically, the research focuses on optimizing the positions and properties of newly introduced points through depth completion to improve the quality and efficiency of the inpainting task. ### Problem Background In 3D scene synthesis, 3D Gaussians have become an important new view - synthesis method due to their efficient representation ability and real - time processing ability. However, when it is necessary to edit these 3D Gaussians, especially in inpainting tasks, how to accurately determine the positions of newly introduced points and their rendering - related properties is a challenge. Compared with 2D inpainting, the inpainting of 3D Gaussians is more complex because it is necessary to ensure the consistency and coherence of newly introduced points with the existing geometric structure. ### Main Contributions of the Paper 1. **Propose the InFusion method**: By learning the diffusion prior to guide the depth - completion model, the depth values of missing regions can be predicted more accurately. 2. **Improve inpainting quality and efficiency**: Compared with existing methods, InFusion can provide higher fidelity in various complex scenes and the speed is increased by about 20 times. 3. **Wide application**: In addition to basic inpainting tasks, InFusion also supports advanced functions such as user - interactive texture inpainting and new object insertion. ### Method Overview To achieve the above - mentioned goals, the paper proposes the following key steps: - **Depth - completion model**: Use a pre - trained Latent Diffusion Model (LDM) for depth completion to generate a more accurate depth map. - **3D Gaussians initialization**: Based on the completed depth map, project new points into 3D space to form an initial 3D Gaussians distribution. - **Multi - view progressive inpainting**: For complex occlusion situations, adopt a multi - view progressive inpainting strategy to gradually complete the inpainting task. Through these methods, InFusion not only improves the quality and efficiency of the inpainting task, but also provides new possibilities for 3D scene editing. ### Formula Display For example, in the depth - completion process, the paper uses the following formula for depth normalization: \[ d'=\frac{d - d_2}{d_{98}-d_2}\times2 - 1 \] where \( d_2 \) and \( d_{98} \) represent the 2nd percentile and 98th percentile values of a single depth map respectively. Hopefully, this information can help you understand the core problem of this paper and its solution. If you have more questions or need further explanation, please feel free to let me know!