Abstract:Emerging Metaverse applications demand accessible, accurate, and easy-to-use tools for 3D digital human creations in order to depict different cultures and societies as if in the physical world. Recent large-scale vision-language advances pave the way to for novices to conveniently customize 3D content. However, the generated CG-friendly assets still cannot represent the desired facial traits for human characteristics. In this paper, we present DreamFace, a progressive scheme to generate personalized 3D faces under text guidance. It enables layman users to naturally customize 3D facial assets that are compatible with CG pipelines, with desired shapes, textures, and fine-grained animation capabilities. From a text input to describe the facial traits, we first introduce a coarse-to-fine scheme to generate the neutral facial geometry with a unified topology. We employ a selection strategy in the CLIP embedding space, and subsequently optimize both the details displacements and normals using Score Distillation Sampling from generic Latent Diffusion Model. Then, for neutral appearance generation, we introduce a dual-path mechanism, which combines the generic LDM with a novel texture LDM to ensure both the diversity and textural specification in the UV space. We also employ a two-stage optimization to perform SDS in both the latent and image spaces to significantly provides compact priors for fine-grained synthesis. Our generated neutral assets naturally support blendshapes-based facial animations. We further improve the animation ability with personalized deformation characteristics by learning the universal expression prior using the cross-identity hypernetwork. Notably, DreamFace can generate of realistic 3D facial assets with physically-based rendering quality and rich animation ability from video footage, even for fashion icons or exotic characters in cartoons and fiction movies.

AnyFace++: A Unified Framework for Free-style Text-to-Face Synthesis and Manipulation.

AnyFace: Free-style Text-to-Face Synthesis and Manipulation

FaceChain: A Playground for Identity-Preserving Portrait Generation

Multimodal-driven Talking Face Generation, Face Swapping, Diffusion Model

FaceSwapNet: Landmark Guided Many-to-Many Face Reenactment

DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance

FaceChain: A Playground for Human-centric Artificial Intelligence Generated Content

ImFace++: A Sophisticated Nonlinear 3D Morphable Face Model with Implicit Neural Representations

Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images

Text-Guided 3D Face Synthesis -- From Generation to Editing

A Generalist FaceX via Learning Unified Facial Representation

Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation

ExpCLIP: Bridging Text and Facial Expressions via Semantic Alignment

Talking Faces: Audio-to-Video Face Generation

SATFace: Subject Agnostic Talking Face Generation with Natural Head Movement

High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning

ChatFace: Chat-Guided Real Face Editing via Diffusion Latent Space Manipulation

FlowFace++: Explicit Semantic Flow-supervised End-to-End Face Swapping

FaceStudio: Put Your Face Everywhere in Seconds

FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio

SynFER: Towards Boosting Facial Expression Recognition with Synthetic Data