GPHM: Gaussian Parametric Head Model for Monocular Head Avatar Reconstruction

Yuelang Xu,Zhaoqi Su,Qingyao Wu,Yebin Liu
2024-10-23
Abstract:Creating high-fidelity 3D human head avatars is crucial for applications in VR/AR, digital human, and film production. Recent advances have leveraged morphable face models to generate animated head avatars from easily accessible data, representing varying identities and expressions within a low-dimensional parametric space. However, existing methods often struggle with modeling complex appearance details, e.g., hairstyles, and suffer from low rendering quality and efficiency. In this paper we introduce a novel approach, 3D Gaussian Parametric Head Model, which employs 3D Gaussians to accurately represent the complexities of the human head, allowing precise control over both identity and expression. The Gaussian model can handle intricate details, enabling realistic representations of varying appearances and complex expressions. Furthermore, we presents a well-designed training framework to ensure smooth convergence, providing a robust guarantee for learning the rich content. Our method achieves high-quality, photo-realistic rendering with real-time efficiency, making it a valuable contribution to the field of parametric head models. Finally, we apply the 3D Gaussian Parametric Head Model to monocular video or few-shot head avatar reconstruction tasks, which enables instant reconstruction of high-quality 3D head avatars even when input data is extremely limited, surpassing previous methods in terms of reconstruction quality and training speed.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper aims to solve several key problems in high - fidelity 3D human head avatar generation: 1. **Modeling complex appearance details**: Existing methods have difficulties in modeling complex appearance details (such as hairstyles), resulting in lower rendering quality and low efficiency. 2. **High - quality real - time rendering**: Existing methods are difficult to ensure real - time performance while achieving high - quality rendering. 3. **Reconstruction with limited data**: Existing methods are insufficient in the ability to perform high - quality 3D avatar reconstruction when the input data is extremely limited (such as monocular videos or a small number of images). To address these problems, the paper proposes a new method based on the 3D Gaussian Parametric Head Model (GPHM). The main contributions of this method include: - **3D Gaussian Parametric Head Model**: Use 3D Gaussian distribution to accurately represent the complex structure of the human head, which can control identity and expressions and handle complex appearance details. - **Efficient training strategy**: Propose a two - stage training strategy to ensure that the model can converge stably when learning rich content details and complex expressions. - **Instant avatar reconstruction framework**: Extend the initially proposed 3D Gaussian Parametric Head Model (GPHM) to support the instant reconstruction of high - quality 3D avatars from monocular videos or a small number of images, achieving the current state - of - the - art quality and training time. Specifically, the paper solves the above problems through the following technical means: - **Data pre - processing**: Use multi - view video datasets and 3D scan datasets for training, and pre - process the data through steps such as background matting and facial alignment. - **Guided geometric model**: Train a guided geometric model using the implicit signed distance field (SDF) representation as the initial value of the Gaussian model to provide a more effective optimization starting point. - **Gaussian Parametric Head Model**: Generate high - fidelity 3D avatars by combining a fixed number of Gaussian point clouds with identity codes and expression codes. - **Two - stage training strategy**: First, roughly train all networks on the mesh - based guided model, and then transfer the network parameters to the Gaussian model and initialize the positions of all Gaussian points to ensure that they are close to the actual surface. - **Expression encoder and non - facial movement encoder**: Introduce a facial expression encoder and a non - facial movement encoder to extract facial expression codes and non - facial movement codes respectively, avoiding direct optimization of these latent codes. Through these technical means, the paper successfully solves multiple challenges in high - fidelity 3D human head avatar generation, achieving high - quality, real - time rendering effects and supporting instant reconstruction from limited data.