Abstract:Emerging Metaverse applications demand accessible, accurate, and easy-to-use tools for 3D digital human creations in order to depict different cultures and societies as if in the physical world. Recent large-scale vision-language advances pave the way to for novices to conveniently customize 3D content. However, the generated CG-friendly assets still cannot represent the desired facial traits for human characteristics. In this paper, we present DreamFace, a progressive scheme to generate personalized 3D faces under text guidance. It enables layman users to naturally customize 3D facial assets that are compatible with CG pipelines, with desired shapes, textures, and fine-grained animation capabilities. From a text input to describe the facial traits, we first introduce a coarse-to-fine scheme to generate the neutral facial geometry with a unified topology. We employ a selection strategy in the CLIP embedding space, and subsequently optimize both the details displacements and normals using Score Distillation Sampling from generic Latent Diffusion Model. Then, for neutral appearance generation, we introduce a dual-path mechanism, which combines the generic LDM with a novel texture LDM to ensure both the diversity and textural specification in the UV space. We also employ a two-stage optimization to perform SDS in both the latent and image spaces to significantly provides compact priors for fine-grained synthesis. Our generated neutral assets naturally support blendshapes-based facial animations. We further improve the animation ability with personalized deformation characteristics by learning the universal expression prior using the cross-identity hypernetwork. Notably, DreamFace can generate of realistic 3D facial assets with physically-based rendering quality and rich animation ability from video footage, even for fashion icons or exotic characters in cartoons and fiction movies.

DreamReward: Text-to-3D Generation with Human Preference

DreamReward: Text-to-3D Generation with Human Preference

Creating High-quality 3D Content by Bridging the Gap Between Text-to-2D and Text-to-3D Generation

X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation

BoostDream: Efficient Refining for High-Quality Text-to-3D Generation from Multi-View Diffusion

Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model

DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior

PlacidDreamer: Advancing Harmony in Text-to-3D Generation

LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching

Retrieval-Augmented Score Distillation for Text-to-3D Generation

ExactDreamer: High-Fidelity Text-to-3D Content Creation via Exact Score Matching

DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance

EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Prior

VividDreamer: Towards High-Fidelity and Efficient Text-to-3D Generation

ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation

AnimatableDreamer: Text-Guided Non-rigid 3D Model Generation and Reconstruction with Canonical Score Distillation

MVReward: Better Aligning and Evaluating Multi-View Diffusion Models with Human Preferences

DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models

OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control

VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation

DreamView: Injecting View-specific Text Guidance into Text-to-3D Generation