GaussianStyle: Gaussian Head Avatar via StyleGAN

Pinxin Liu,Luchuan Song,Daoan Zhang,Hang Hua,Yunlong Tang,Huaijin Tu,Jiebo Luo,Chenliang Xu
2024-08-20
Abstract:Existing methods like Neural Radiation Fields (NeRF) and 3D Gaussian Splatting (3DGS) have made significant strides in facial attribute control such as facial animation and components editing, yet they struggle with fine-grained representation and scalability in dynamic head modeling. To address these limitations, we propose GaussianStyle, a novel framework that integrates the volumetric strengths of 3DGS with the powerful implicit representation of StyleGAN. The GaussianStyle preserves structural information, such as expressions and poses, using Gaussian points, while projecting the implicit volumetric representation into StyleGAN to capture high-frequency details and mitigate the over-smoothing commonly observed in neural texture rendering. Experimental outcomes indicate that our method achieves state-of-the-art performance in reenactment, novel view synthesis, and animation.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### The Problem the Paper Attempts to Solve This paper aims to address the issues of fine-grained representation and scalability encountered in existing methods for dynamic head modeling. Specifically: 1. **Fine-grained Representation Issue**: Existing methods such as Neural Radiation Fields (NeRF) and 3D Gaussian Splatting (3DGS) have made significant progress in facial animation and component editing, but they still face issues with fine-grained representation in dynamic head modeling. This leads to overly smooth rendering results, especially in areas with high dynamic expressions and motion changes. 2. **Scalability Issue**: Existing methods assume that fixed 3D coordinates correspond to the same facial regions throughout the entire sequence when dealing with dynamic head modeling. However, in reality, head movements and dynamic expressions cause significant changes in the relative positions of facial features, making it difficult for fixed-coordinate templates to accurately align with the actual facial regions. 3. **Cross-scene Reproduction Issue**: Existing methods perform poorly in cross-scene reproduction (such as new expressions, poses, or camera angles), particularly when dealing with novel head poses or camera angles, leading to blurriness and noise. To address these issues, the paper proposes GaussianStyle, a novel framework that combines 3D Gaussian Splatting and StyleGAN. GaussianStyle addresses the above problems in the following ways: - **Structural Information Retention**: Uses Gaussian points to retain structural information such as expressions and poses. - **High-frequency Detail Capture**: Projects implicit volumetric representations into StyleGAN to capture high-frequency details, reducing the over-smoothing phenomenon in neural texture rendering. - **Dynamic Modeling**: Introduces time-aware tri-plane representation and attention-based deformation modules to improve the robustness and accuracy of dynamic 4D face rendering. - **Efficient Mapping**: Designs an efficient pipeline to map dynamic 3D representations into the latent space of StyleGAN, achieving high-quality volumetric 3D rendering while maintaining the generalization ability of pre-trained models. Through these innovations, GaussianStyle achieves state-of-the-art performance in self-reproduction, novel view synthesis, and animation.