Abstract:The problem of modeling an animatable 3D human head avatar under light-weight setups is of significant importance but has not been well solved. Existing 3D representations either perform well in the realism of portrait images synthesis or the accuracy of expression control, but not both. To address the problem, we introduce a novel hybrid explicit-implicit 3D representation, Facial Model Conditioned Neural Radiance Field, which integrates the expressiveness of NeRF and the prior information from the parametric template. At the core of our representation, a synthetic-renderings-based condition method is proposed to fuse the prior information from the parametric model into the implicit field without constraining its topological flexibility. Besides, based on the hybrid representation, we properly overcome the inconsistent shape issue presented in existing methods and improve the animation stability. Moreover, by adopting an overall GAN-based architecture using an image-to-image translation network, we achieve high-resolution, realistic and view-consistent synthesis of dynamic head appearance. Experiments demonstrate that our method can achieve state-of-the-art performance for 3D head avatar animation compared with previous methods.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to create an animatable high - fidelity 3D human head avatar under a lightweight setting while achieving fine - grained control of facial expressions and realistic synthesis of portrait images. Existing 3D representation methods either perform well in the realism of portrait image synthesis or in the accuracy of expression control, but it is difficult to take both into account simultaneously. To solve this problem, the paper introduces a new hybrid explicit - implicit 3D representation method - **Facial Model Conditioned Neural Radiance Field**. ### Main Contributions 1. **Propose a new Facial Model Conditioned Neural Radiance Field (NeRF)** for personalized 3D human head avatars. This representation method is based on the feature volume of orthogonal synthesis rendering, can flexibly handle topological structures, and precisely control head movements and facial expressions. 2. **Develop a new generator adjustment strategy** to deal with the shape inconsistency problems that occur in existing NeRF - based avatar modeling methods through conditional embedding, significantly improving the stability of animation. 3. **Achieve high - resolution, realistic, and view - consistent dynamic head appearance synthesis for the first time** by using an overall GAN architecture that combines an efficient avatar representation and an image - to - image translation module. 4. **In addition to learning head avatars from monocular videos**, also demonstrate the method of modeling head avatars from multi - view videos (using 6 cameras) and experimentally verify the superior performance of this method compared to other modified state - of - the - art methods. ### Method Overview 1. **NeRF under the Parametric Model Condition**: - **Definition**: Redefine \( H_C \) in NeRF so that it depends not only on position \( x_c \) and per - frame embedding \( \gamma_t \), but also on the tracked deformed mesh model \( M_t \) and head pose \( p_t \). - **Synthesis - Rendered Feature Volume**: Generate a feature volume by orthogonally rendering the zero - pose of the facial model to describe the head appearance in the canonical space. Use the planar features of the front view and the two side views to characterize the head avatar. - **Conditionally Learnable Embedding**: Introduce additional conditional embeddings to solve the alignment problem between the tracked facial model and the actual observation and prevent shape inconsistency between frames. 2. **Head Movement Decoupling Module**: - **Rigid Skeleton Deformation**: Handle the rigid movement of the head through the estimated head pose to avoid unrealistic synchronous movements of the head and body. - **Linear Blending Skinning Weights**: Calculate the head rigid deformation \( T \) to deform the appearance volume \( H_C \) in the canonical space to the appearance volume \( H \) in the observation space. ### Experimental Results The paper experimentally verifies the superior performance of the proposed method in high - resolution, realistic, and view - consistent dynamic head appearance synthesis, especially showing higher robustness when dealing with image synthesis under large rotation angles. ### Conclusion The paper proposes a new Facial Model Conditioned Neural Radiance Field method, which successfully solves the problem of creating high - fidelity 3D human head avatars under a lightweight setting and achieves fine - grained control of facial expressions and realistic synthesis of portrait images.

HAvatar: High-fidelity Head Avatar via Facial Model Conditioned Neural Radiance Field

High-Fidelity 3D Head Avatars Reconstruction through Spatially-Varying Expression Conditioned Neural Radiance Field

Controllable One-shot Head Avatar Reconstruction

BakedAvatar: Baking Neural Fields for Real-Time Head Avatar Synthesis

Artist-Friendly Relightable and Animatable Neural Heads

LightAvatar: Efficient Head Avatar as Dynamic Neural Light Field

Unified Volumetric Avatar: Enabling Flexible Editing and Rendering of Neural Human Representations

MA-NeRF: Motion-Assisted Neural Radiance Fields for Face Synthesis from Sparse Images

GANHead: Towards Generative Animatable Neural Head Avatars

HVTR++: Image and Pose Driven Human Avatars Using Hybrid Volumetric-Textural Rendering.

HVTR: Hybrid Volumetric-Textural Rendering for Human Avatars

Efficient 3D Implicit Head Avatar with Mesh-anchored Hash Table Blendshapes

Semantic-aware Hyper-Space Deformable Neural Radiance Fields for Facial Avatar Reconstruction

Neural Point-based Volumetric Avatar: Surface-guided Neural Points for Efficient and Photorealistic Volumetric Head Avatar

Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos

High-fidelity Facial Avatar Reconstruction from Monocular Video with Generative Priors

Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction

NOFA: NeRF-based One-shot Facial Avatar Reconstruction

Learning Compositional Radiance Fields of Dynamic Human Heads

Representing Animatable Avatar via Factorized Neural Fields

HQ3DAvatar: High Quality Implicit 3D Head Avatar