Abstract:We present OmniAvatar, a novel geometry-guided 3D head synthesis model trained from in-the-wild unstructured images that is capable of synthesizing diverse identity-preserved 3D heads with compelling dynamic details under full disentangled control over camera poses, facial expressions, head shapes, articulated neck and jaw poses. To achieve such high level of disentangled control, we first explicitly define a novel semantic signed distance function (SDF) around a head geometry (FLAME) conditioned on the control parameters. This semantic SDF allows us to build a differentiable volumetric correspondence map from the observation space to a disentangled canonical space from all the control parameters. We then leverage the 3D-aware GAN framework (EG3D) to synthesize detailed shape and appearance of 3D full heads in the canonical space, followed by a volume rendering step guided by the volumetric correspondence map to output into the observation space. To ensure the control accuracy on the synthesized head shapes and expressions, we introduce a geometry prior loss to conform to head SDF and a control loss to conform to the expression code. Further, we enhance the temporal realism with dynamic details conditioned upon varying expressions and joint poses. Our model can synthesize more preferable identity-preserved 3D heads with compelling dynamic details compared to the state-of-the-art methods both qualitatively and quantitatively. We also provide an ablation study to justify many of our system design choices.

What problem does this paper attempt to address?

### Problems the paper attempts to solve The paper "OmniAvatar: Geometry - Guided Controllable 3D Head Synthesis" aims to solve the following problems: 1. **High - precision 3D head synthesis**: Existing 3D head synthesis methods have deficiencies in controlling camera view, facial expressions, head shape, neck and jaw postures, etc., and cannot achieve highly decoupled control. This paper proposes a new geometry - guided 3D head synthesis model that can achieve fine - grained control in these aspects. 2. **Generation of dynamic details**: Existing methods perform poorly in generating dynamic details (such as wrinkles, light and shadow changes, etc.), especially when different expressions and postures change. In this paper, by introducing noise - conditional expressions, the generation of dynamic details is enhanced, making the synthesized 3D head more realistic. 3. **High - quality image synthesis**: Existing 3D head synthesis methods still have room for improvement in image quality. In this paper, by combining 3D - aware generative adversarial networks (3D GAN) and neural radiance fields (NeRF) techniques, high - quality multi - view - consistent image synthesis is achieved. 4. **3D reconstruction from single - view images**: Existing methods usually require multi - view data for 3D reconstruction, while the method in this paper can achieve high - quality 3D head reconstruction only from a single - view image and support multi - view - consistent head reenactment. ### Main contributions - **Novel geometry - guided 3D GAN framework**: It can achieve comprehensive control of camera view, facial expressions, head shape, neck and jaw postures. - **Semantic signed - distance function (SDF)**: Defines a volume correspondence map from the observation space to the canonical space, allowing for complete decoupling of control parameters in 3D GAN training. - **Geometry prior loss and control loss**: Ensure the accuracy of the synthesized 3D head shape and expressions. - **Noise - conditional expressions**: By introducing noise - conditional expressions, the generation of dynamic details is enhanced and the temporal consistency is improved. ### Method overview 1. **Semantic signed - distance function (SDF)**: - Defines a new semantic signed - distance function \(W(x|p = (\alpha,\beta,\theta))=(s,\bar{x})\), where \(\alpha\) and \(\beta\) represent the linear shape and expression blend - shape coefficients respectively, and \(\theta\) controls the 3 - degree - of - freedom jaw and neck joint rotation. - Given a point \(x\) in the observation space, the function \(W\) returns its corresponding point \(\bar{x}\) in the canonical space and calculates its nearest signed - distance \(s(x|p)\) to the FLAME mesh surface. 2. **Canonical generation and geometry prior**: - Utilize the pre - trained semantic SDF model \(W(x|p)\) to model shape and expression changes and use tri - plane to generate 3D - aware human heads. - Introduce a geometry prior loss \(L_{\text{prior}}\) to guide the generation of the neural radiance density field so that it conforms to the FLAME head geometry. 3. **Fine - grained expression control**: - Use an image - level supervision loss \(L_{\text{enc}}\) to improve the precision of expression control and ensure that the expression of the synthesized image is consistent with the input control parameters. 4. **Dynamic detail modeling**: - By introducing noise - conditional expressions \(\beta\) and \(\theta\) in the MLP decoder, the generation of dynamic details is enhanced, making the synthesized 3D head show more realistic details when different expressions and postures change. ### Experimental results - **Quantitative comparison**: The method in this paper outperforms existing 2D and 3D controllable image synthesis methods in both image quality and control decoupling. - **Ablation study**: Verifies the role of the geometry prior loss \(L_{\text{prior}}\) and the self - supervised reconstruction loss \(L_{\text{enc}}\) in improving the precision of shape and expression control. ### Conclusion This paper proposes OmniAvata

OmniAvatar: Geometry-Guided Controllable 3D Head Synthesis

PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360$^{\circ}$

HQ3DAvatar: High Quality Implicit 3D Head Avatar

TimeWalker: Personalized Neural Space for Lifelong Head Avatars

HQ3DAvatar: High Quality Controllable 3D Head Avatar

GANHead: Towards Generative Animatable Neural Head Avatars

Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos

MonoGaussianAvatar: Monocular Gaussian Point-based Head Avatar

HeadSculpt: Crafting 3D Head Avatars with Text

GPHM: Gaussian Parametric Head Model for Monocular Head Avatar Reconstruction

GAN-Avatar: Controllable Personalized GAN-based Human Head Avatar

Head360: Learning a Parametric 3D Full-Head for Free-View Synthesis in 360°

HAvatar: High-fidelity Head Avatar via Facial Model Conditioned Neural Radiance Field

Portrait4D: Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data

HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors

FATE: Full-head Gaussian Avatar with Textural Editing from Monocular Video

Towards Native Generative Model for 3D Head Avatar

Learning to regulate 3D head shape by removing occluding hair from in-the-wild images

3D Gaussian Parametric Head Model

RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars

Topology-aware Human Avatars with Semantically-guided Gaussian Splatting