Abstract:Our goal is to create a realistic 3D facial avatar with hair and accessories using only a text description. While this challenge has attracted significant recent interest, existing methods either lack realism, produce unrealistic shapes, or do not support editing, such as modifications to the hairstyle. We argue that existing methods are limited because they employ a monolithic modeling approach, using a single representation for the head, face, hair, and accessories. Our observation is that the hair and face, for example, have very different structural qualities that benefit from different representations. Building on this insight, we generate avatars with a compositional model, in which the head, face, and upper body are represented with traditional 3D meshes, and the hair, clothing, and accessories with neural radiance fields (NeRF). The model-based mesh representation provides a strong geometric prior for the face region, improving realism while enabling editing of the person's appearance. By using NeRFs to represent the remaining components, our method is able to model and synthesize parts with complex geometry and appearance, such as curly hair and fluffy scarves. Our novel system synthesizes these high-quality compositional avatars from text descriptions. The experimental results demonstrate that our method, Text-guided generation and Editing of Compositional Avatars (TECA), produces avatars that are more realistic than those of recent methods while being editable because of their compositional nature. For example, our TECA enables the seamless transfer of compositional features like hairstyles, scarves, and other accessories between avatars. This capability supports applications such as virtual try-on.

GenCA: A Text-conditioned Generative Model for Realistic and Drivable Codec Avatars

Morphable Diffusion: 3D-Consistent Diffusion for Single-image Avatar Creation

DynamicAvatars: Accurate Dynamic Facial Avatars Reconstruction and Precise Editing with Diffusion Models

XAGen: 3D Expressive Human Avatars Generation

Text-Guided Generation and Editing of Compositional 3D Avatars

StyleAvatar3D: Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation

GAN-Avatar: Controllable Personalized GAN-based Human Head Avatar

GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image

GETAvatar: Generative Textured Meshes for Animatable Human Avatars

DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models

Auto-CARD: Efficient and Robust Codec Avatar Driving for Real-time Mobile Telepresence

Articulated 3D Head Avatar Generation using Text-to-Image Diffusion Models

E^3Gen: Efficient, Expressive and Editable Avatars Generation

ExpAvatar: High-Fidelity Avatar Generation of Unseen Expressions with 3D Face Priors

$E^{3}$Gen: Efficient, Expressive and Editable Avatars Generation

AvatarStudio: High-fidelity and Animatable 3D Avatar Creation from Text

Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models

DivAvatar: Diverse 3D Avatar Generation with a Single Prompt

GANHead: Towards Generative Animatable Neural Head Avatars

From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations