Disentangled Clothed Avatar Generation from Text Descriptions

Jionghao Wang,Yuan Liu,Zhiyang Dou,Zhengming Yu,Yongqing Liang,Cheng Lin,Xin Li,Wenping Wang,Rong Xie,Li Song

2024-09-27

Abstract:In this paper, we introduce a novel text-to-avatar generation method that separately generates the human body and the clothes and allows high-quality animation on the generated avatar. While recent advancements in text-to-avatar generation have yielded diverse human avatars from text prompts, these methods typically combine all elements-clothes, hair, and body-into a single 3D representation. Such an entangled approach poses challenges for downstream tasks like editing or animation. To overcome these limitations, we propose a novel disentangled 3D avatar representation named Sequentially Offset-SMPL (SO-SMPL), building upon the SMPL model. SO-SMPL represents the human body and clothes with two separate meshes but associates them with offsets to ensure the physical alignment between the body and the clothes. Then, we design a Score Distillation Sampling (SDS)-based distillation framework to generate the proposed SO-SMPL representation from text prompts. Our approach not only achieves higher texture and geometry quality and better semantic alignment with text prompts, but also significantly improves the visual quality of character animation, virtual try-on, and avatar editing. Project page: <a class="link-external link-https" href="https://shanemankiw.github.io/SO-SMPL/" rel="external noopener nofollow">this https URL</a>.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper attempts to address the problem of generating high-quality and animatable decoupled avatars based on textual descriptions. Specifically, most existing studies, when generating avatars from text prompts, typically merge all elements such as clothes, hair, and body into a single 3D representation, which poses challenges for subsequent editing or animation tasks. Therefore, this paper proposes a novel approach, Sequentially Offset-SMPL (SO-SMPL), which generates separate body and clothing meshes based on the SMPL model. This method not only improves texture and geometric quality but also significantly enhances the visual effects of character animation, virtual try-on, and avatar editing. By using the Score Distillation Sampling (SDS) framework, the method can generate decoupled SO-SMPL representations from text prompts. Experiments show that the avatars generated by this method have higher texture and geometric detail quality and better semantic alignment with the text prompts. Additionally, the generated avatars exhibit highly realistic animation effects in animation simulation environments.

Disentangled Clothed Avatar Generation from Text Descriptions

DAGSM: Disentangled Avatar Generation with GS-enhanced Mesh

AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars Using 2D Diffusion

AvatarStudio: High-fidelity and Animatable 3D Avatar Creation from Text

Barbie: Text to Barbie-Style 3D Avatars

X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation

SEEAvatar: Photorealistic Text-to-3D Avatar Generation with Constrained Geometry and Appearance

TADA! Text to Animatable Digital Avatars

Text2Avatar: Text to 3D Human Avatar Generation with Codebook-Driven Body Controllable Attribute

DreamWaltz: Make a Scene with Complex 3D Animatable Avatars

DivAvatar: Diverse 3D Avatar Generation with a Single Prompt

ClotheDreamer: Text-Guided Garment Generation with 3D Gaussians

AvatarBooth: High-Quality and Customizable 3D Human Avatar Generation

GETAvatar: Generative Textured Meshes for Animatable Human Avatars

DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion

DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models

AvatarCLIP: zero-shot text-driven generation and animation of 3D avatars

Text/Speech-Driven Full-Body Animation

Stratified Avatar Generation from Sparse Observations

LAGA: Layered 3D Avatar Generation and Customization via Gaussian Splatting