Abstract:Real-time rendering of human head avatars is a cornerstone of many computer graphics applications, such as augmented reality, video games, and films, to name a few. Recent approaches address this challenge with computationally efficient geometry primitives in a carefully calibrated multi-view setup. Albeit producing photorealistic head renderings, it often fails to represent complex motion changes such as the mouth interior and strongly varying head poses. We propose a new method to generate highly dynamic and deformable human head avatars from multi-view imagery in real-time. At the core of our method is a hierarchical representation of head models that allows to capture the complex dynamics of facial expressions and head movements. First, with rich facial features extracted from raw input frames, we learn to deform the coarse facial geometry of the template mesh. We then initialize 3D Gaussians on the deformed surface and refine their positions in a fine step. We train this coarse-to-fine facial avatar model along with the head pose as a learnable parameter in an end-to-end framework. This enables not only controllable facial animation via video inputs, but also high-fidelity novel view synthesis of challenging facial expressions, such as tongue deformations and fine-grained teeth structure under large motion changes. Moreover, it encourages the learned head avatar to generalize towards new facial expressions and head poses at inference time. We demonstrate the performance of our method with comparisons against the related methods on different datasets, spanning challenging facial expression sequences across multiple identities. We also show the potential application of our approach by demonstrating a cross-identity facial performance transfer application.

GaussianSpeech: Audio-Driven Gaussian Avatars

GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians

GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting

GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting

GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable Gaussian Splatting

Audio-driven Talking Face Video Generation with Natural Head Pose

From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning

FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models

TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting

GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians

DEGAS: Detailed Expressions on Full-Body Gaussian Avatars

Animatable 3D Gaussian: Fast and High-Quality Reconstruction of Multiple Human Avatars

GaussianHead: High-fidelity Head Avatars with Learnable Gaussian Derivation

NPGA: Neural Parametric Gaussian Avatars

Generalizable and Animatable Gaussian Head Avatar

GASP: Gaussian Avatars with Synthetic Priors

Gaussian Deja-vu: Creating Controllable 3D Gaussian Head-Avatars with Enhanced Generalization and Personalization Abilities

GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations

HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting