Gaussian3Diff: 3D Gaussian Diffusion for 3D Full Head Synthesis and Editing

Yushi Lan,Feitong Tan,Di Qiu,Qiangeng Xu,Kyle Genova,Zeng Huang,Sean Fanello,Rohit Pandey,Thomas Funkhouser,Chen Change Loy,Yinda Zhang
2023-12-20
Abstract:We present a novel framework for generating photorealistic 3D human head and subsequently manipulating and reposing them with remarkable flexibility. The proposed approach leverages an implicit function representation of 3D human heads, employing 3D Gaussians anchored on a parametric face model. To enhance representational capabilities and encode spatial information, we embed a lightweight tri-plane payload within each Gaussian rather than directly storing color and opacity. Additionally, we parameterize the Gaussians in a 2D UV space via a 3DMM, enabling effective utilization of the diffusion model for 3D head avatar generation. Our method facilitates the creation of diverse and realistic 3D human heads with fine-grained editing over facial features and expressions. Extensive experiments demonstrate the effectiveness of our method.
Computer Vision and Pattern Recognition,Graphics,Machine Learning
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problem of generating and editing realistic 3D human head models, especially in terms of high - quality multi - view - consistent image generation, and flexible expression and shape editing. Specifically, the paper proposes a new framework - **Gaussian 3DIFF** - for generating realistic 3D human heads and performing flexible global and local edits on them. #### Main problems include: 1. **High - quality 3D human head generation**: - Existing methods have limitations in generating high - resolution, multi - view - consistent 3D human heads, especially in detail editing. - The paper improves the generation quality by introducing a method based on 3D Gaussian distribution and tri - plane representation. 2. **Flexible editing capabilities**: - Existing methods perform poorly in editing 3D human heads, especially in expression and area editing, and it is difficult to achieve fine - grained control. - Gaussian 3DIFF achieves flexible editing of shape, expression, and appearance by combining 3D Gaussian distribution with a parametric face model (3DMM). 3. **Stability and diversity**: - Traditional generative adversarial networks (GANs) are unstable in generation and editing quality and lack diversity. - By using a diffusion model, Gaussian 3DIFF can provide higher stability and diversity while performing unconditional generation. 4. **Geometry and texture decoupling**: - Existing methods have difficulty in decoupling geometry and texture, resulting in unnatural effects during editing. - Gaussian 3DIFF supports the decoupling of geometry and texture through the 3D Gaussian distribution of floating objects and tri - plane loading, thereby achieving more natural editing effects. #### Specific solutions: - **3D Gaussian distribution and tri - plane representation**: Use 3D Gaussian distribution to represent 3D human heads in UV space, and each Gaussian distribution carries a tri - plane load for encoding geometric and texture information. - **Auto - decoder training**: Learn 3D Gaussian distribution from data generated by pre - trained 3D GAN through auto - decoder to ensure high - quality reconstruction and diffusion model training. - **Application of diffusion model**: Use the diffusion model for unconditional generation and editing in UV space, supporting flexible global and local editing operations. Through these innovations, Gaussian 3DIFF can not only generate high - quality 3D human heads but also provide powerful editing capabilities in various application scenarios, such as virtual reality (VR), augmented reality (AR), digital games, and film production.