Abstract:Photo-realistic and controllable 3D avatars are crucial for various applications such as virtual and mixed reality (VR/MR), telepresence, gaming, and film production. Traditional methods for avatar creation often involve time-consuming scanning and reconstruction processes for each avatar, which limits their scalability. Furthermore, these methods do not offer the flexibility to sample new identities or modify existing ones. On the other hand, by learning a strong prior from data, generative models provide a promising alternative to traditional reconstruction methods, easing the time constraints for both data capture and processing. Additionally, generative methods enable downstream applications beyond reconstruction, such as editing and stylization. Nonetheless, the research on generative 3D avatars is still in its infancy, and therefore current methods still have limitations such as creating static avatars, lacking photo-realism, having incomplete facial details, or having limited drivability. To address this, we propose a text-conditioned generative model that can generate photo-realistic facial avatars of diverse identities, with more complete details like hair, eyes and mouth interior, and which can be driven through a powerful non-parametric latent expression space. Specifically, we integrate the generative and editing capabilities of latent diffusion models with a strong prior model for avatar expression driving. Our model can generate and control high-fidelity avatars, even those out-of-distribution. We also highlight its potential for downstream applications, including avatar editing and single-shot avatar reconstruction.

GG-Editor: Locally Editing 3D Avatars with Multimodal Large Language Model Guidance

DynamicAvatars: Accurate Dynamic Facial Avatars Reconstruction and Precise Editing with Diffusion Models

HeadSculpt: Crafting 3D Head Avatars with Text

GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image

Learning Locally Editable Virtual Humans

Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts

GGAvatar: Geometric Adjustment of Gaussian Head Avatar

AttriHuman-3D: Editable 3D Human Avatar Generation with Attribute Decomposition and Indexing

GenCA: A Text-conditioned Generative Model for Realistic and Drivable Codec Avatars

$E^{3}$Gen: Efficient, Expressive and Editable Avatars Generation

SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing

AvatarStudio: Text-driven Editing of 3D Dynamic Human Head Avatars

Guide3D: Create 3D Avatars from Text and Image Guidance

MaTe3D: Mask-guided Text-based 3D-aware Portrait Editing

HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting

Avatar Concept Slider: Manipulate Concepts In Your Human Avatar With Fine-grained Control

X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation

InstructHumans: Editing Animated 3D Human Textures with Instructions

Zero-shot Text-driven Physically Interpretable Face Editing

SEEAvatar: Photorealistic Text-to-3D Avatar Generation with Constrained Geometry and Appearance

DivAvatar: Diverse 3D Avatar Generation with a Single Prompt