HAvatar: High-fidelity Head Avatar via Facial Model Conditioned Neural Radiance Field

Xiaochen Zhao,Lizhen Wang,Jingxiang Sun,Hongwen Zhang,Jinli Suo,Yebin Liu
DOI: https://doi.org/10.48550/arXiv.2309.17128
2023-09-29
Abstract:The problem of modeling an animatable 3D human head avatar under light-weight setups is of significant importance but has not been well solved. Existing 3D representations either perform well in the realism of portrait images synthesis or the accuracy of expression control, but not both. To address the problem, we introduce a novel hybrid explicit-implicit 3D representation, Facial Model Conditioned Neural Radiance Field, which integrates the expressiveness of NeRF and the prior information from the parametric template. At the core of our representation, a synthetic-renderings-based condition method is proposed to fuse the prior information from the parametric model into the implicit field without constraining its topological flexibility. Besides, based on the hybrid representation, we properly overcome the inconsistent shape issue presented in existing methods and improve the animation stability. Moreover, by adopting an overall GAN-based architecture using an image-to-image translation network, we achieve high-resolution, realistic and view-consistent synthesis of dynamic head appearance. Experiments demonstrate that our method can achieve state-of-the-art performance for 3D head avatar animation compared with previous methods.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to create an animatable high - fidelity 3D human head avatar under a lightweight setting while achieving fine - grained control of facial expressions and realistic synthesis of portrait images. Existing 3D representation methods either perform well in the realism of portrait image synthesis or in the accuracy of expression control, but it is difficult to take both into account simultaneously. To solve this problem, the paper introduces a new hybrid explicit - implicit 3D representation method - **Facial Model Conditioned Neural Radiance Field**. ### Main Contributions 1. **Propose a new Facial Model Conditioned Neural Radiance Field (NeRF)** for personalized 3D human head avatars. This representation method is based on the feature volume of orthogonal synthesis rendering, can flexibly handle topological structures, and precisely control head movements and facial expressions. 2. **Develop a new generator adjustment strategy** to deal with the shape inconsistency problems that occur in existing NeRF - based avatar modeling methods through conditional embedding, significantly improving the stability of animation. 3. **Achieve high - resolution, realistic, and view - consistent dynamic head appearance synthesis for the first time** by using an overall GAN architecture that combines an efficient avatar representation and an image - to - image translation module. 4. **In addition to learning head avatars from monocular videos**, also demonstrate the method of modeling head avatars from multi - view videos (using 6 cameras) and experimentally verify the superior performance of this method compared to other modified state - of - the - art methods. ### Method Overview 1. **NeRF under the Parametric Model Condition**: - **Definition**: Redefine \( H_C \) in NeRF so that it depends not only on position \( x_c \) and per - frame embedding \( \gamma_t \), but also on the tracked deformed mesh model \( M_t \) and head pose \( p_t \). - **Synthesis - Rendered Feature Volume**: Generate a feature volume by orthogonally rendering the zero - pose of the facial model to describe the head appearance in the canonical space. Use the planar features of the front view and the two side views to characterize the head avatar. - **Conditionally Learnable Embedding**: Introduce additional conditional embeddings to solve the alignment problem between the tracked facial model and the actual observation and prevent shape inconsistency between frames. 2. **Head Movement Decoupling Module**: - **Rigid Skeleton Deformation**: Handle the rigid movement of the head through the estimated head pose to avoid unrealistic synchronous movements of the head and body. - **Linear Blending Skinning Weights**: Calculate the head rigid deformation \( T \) to deform the appearance volume \( H_C \) in the canonical space to the appearance volume \( H \) in the observation space. ### Experimental Results The paper experimentally verifies the superior performance of the proposed method in high - resolution, realistic, and view - consistent dynamic head appearance synthesis, especially showing higher robustness when dealing with image synthesis under large rotation angles. ### Conclusion The paper proposes a new Facial Model Conditioned Neural Radiance Field method, which successfully solves the problem of creating high - fidelity 3D human head avatars under a lightweight setting and achieves fine - grained control of facial expressions and realistic synthesis of portrait images.