HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors

Xiaozheng Zheng,Chao Wen,Zhaohu Li,Weiyi Zhang,Zhuo Su,Xu Chang,Yang Zhao,Zheng Lv,Xiaoyuan Zhang,Yongjie Zhang,Guidong Wang,Lan Xu
2024-08-12
Abstract:In this paper, we present a novel 3D head avatar creation approach capable of generalizing from few-shot in-the-wild data with high-fidelity and animatable robustness. Given the underconstrained nature of this problem, incorporating prior knowledge is essential. Therefore, we propose a framework comprising prior learning and avatar creation phases. The prior learning phase leverages 3D head priors derived from a large-scale multi-view dynamic dataset, and the avatar creation phase applies these priors for few-shot personalization. Our approach effectively captures these priors by utilizing a Gaussian Splatting-based auto-decoder network with part-based dynamic modeling. Our method employs identity-shared encoding with personalized latent codes for individual identities to learn the attributes of Gaussian primitives. During the avatar creation phase, we achieve fast head avatar personalization by leveraging inversion and fine-tuning strategies. Extensive experiments demonstrate that our model effectively exploits head priors and successfully generalizes them to few-shot personalization, achieving photo-realistic rendering quality, multi-view consistency, and stable animation.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper mainly addresses the following issues: 1. **High-fidelity, animation-stable 3D head avatar creation**: The study proposes a new method called HeadGAP, which can create high-fidelity and animation-stable 3D head avatars using only a few (or even a single) target person images. 2. **Utilizing universal 3D Gaussian prior knowledge**: To achieve the above goal, the method leverages large-scale datasets to learn universal 3D Gaussian head prior knowledge and uses this prior knowledge to create high-quality personalized 3D head avatars. 3. **Reducing data requirements**: Traditional 3D head avatar creation methods often require a large amount of multi-view or sequential data, while HeadGAP aims to reduce this requirement by using a small number of input images, making it easy for ordinary users to create 3D avatars. Specifically, HeadGAP consists of two stages: - **Prior learning stage**: Embedding 3D prior knowledge into the Gaussian Prior Network (GAPNet) using multi-view dynamic data. - **Few-shot personalization stage**: Creating 3D head avatars of new identities from a small number of input images using the learned prior knowledge. The key contributions of HeadGAP include: - Proposing a new framework that can quickly personalize 3D head avatars using universal 3D Gaussian prior knowledge, with high fidelity and consistent animation quality. - Designing effective components that can efficiently utilize part-based dynamic 3D Gaussian head prior knowledge and generalize it to the personalization of few-shot head avatars. - Validating the effectiveness and robustness of the framework through comprehensive experiments and demonstrating its potential in real-world scenarios, including creating avatars from images captured with consumer-grade devices. In summary, the paper addresses the problem of efficiently creating high-quality, animation-stable 3D head avatars using a small number of images and validates the effectiveness of the method through a series of experiments.