Abstract:We propose GGAvatar, a novel 3D avatar representation designed to robustly model dynamic head avatars with complex identities and deformations. GGAvatar employs a coarse-to-fine structure, featuring two core modules: Neutral Gaussian Initialization Module and Geometry Morph Adjuster. Neutral Gaussian Initialization Module pairs Gaussian primitives with deformable triangular meshes, employing an adaptive density control strategy to model the geometric structure of the target subject with neutral expressions. Geometry Morph Adjuster introduces deformation bases for each Gaussian in global space, creating fine-grained low-dimensional representations of deformation behaviors to address the Linear Blend Skinning formula's limitations effectively. Extensive experiments show that GGAvatar can produce high-fidelity renderings, outperforming state-of-the-art methods in visual quality and quantitative metrics.
What problem does this paper attempt to address?
This paper attempts to solve the following problems:
1. **Creation of high - fidelity 3D avatars**: How to create high - quality, real - time interactive 3D avatars to meet the needs of the metaverse and other application scenarios (such as immersive telepresence and augmented reality or virtual reality).
2. **Modeling of complex identities and deformations**: Existing methods have limitations in dealing with complex identities (such as wearing accessories like hats, glasses, etc.) and extreme expressions or postures. How to effectively model dynamic avatars with complex identities and deformations?
3. **Improvement of initialization and deformation strategies**:
- **Initialization problem**: How to correctly initialize the geometric structure in the early training stage to accelerate convergence?
- **Deformation problem**: How to design a deformation strategy so that 3D avatars can be generalized to unseen postures and expressions? In particular, how to capture the movement of small non - surface areas (such as wrinkles and hair)?
To solve these problems, the paper proposes **GGAvatar**, a new 3D avatar representation method, which is achieved through the following two core modules:
- **Neutral Gaussian Initialization Module**: This module uses an adaptive density control strategy to pair Gaussian primitives with deformable triangular meshes to model the geometric structure of the target object and accelerate training convergence.
- **Geometry Morph Adjuster**: This module introduces a deformation basis in the global space to create a low - dimensional deformation behavior representation for each Gaussian primitive, thus effectively solving the limitations of the Linear Blend Skinning (LBS) formula.
Through these innovations, GGAvatar can surpass existing methods in visual quality and quantitative indicators, generate high - fidelity rendered images, and perform well in new - view synthesis and cross - identity reenactment.
### Formula summary
- **Definition of Gaussian primitives**:
\[
G(x)=e^{-\frac{1}{2}(x - \mu)^T\Sigma^{-1}(x - \mu)}
\]
where \(\mu\) is the mean vector and \(\Sigma\) is the covariance matrix.
- **Covariance matrix parameterization**:
\[
\Sigma = R S S^T R^T
\]
where \(R\) is the rotation matrix and \(S\) is the scaling matrix, represented by the learnable quaternion \(r\in\mathbb{R}^4\) and the scaling vector \(s\in\mathbb{R}^3\) respectively.
- **Pixel color calculation**:
\[
C=\sum_{i\in N}c_i\alpha_i\prod_{j = 1}^{i - 1}(1-\alpha_j)
\]
where \(c_i\) is the color of each point, represented by spherical harmonic functions, and \(\alpha_i\) is the fusion weight, calculated by 3D Gaussian projection multiplied by the opacity \(o\) of each point.
- **Calculation of additional deformation basis**:
\[
W_\Theta = F_H(H_3(x))
\]
\[
f = F_{\psi,\theta}(\psi,\theta)
\]
\[
\Delta\mu,\Delta r,\Delta s = W_\Theta\cdot f
\]
\[
\mu,r,s = (\mu'\oplus\Delta\mu,r'\oplus\Delta r,s'\oplus\Delta s)
\]
Through these formulas and methods, GGAvatar can more accurately capture complex facial deformation details, thereby significantly improving the quality and expressiveness of 3D avatars.