Disentangled Clothed Avatar Generation from Text Descriptions

Jionghao Wang,Yuan Liu,Zhiyang Dou,Zhengming Yu,Yongqing Liang,Cheng Lin,Xin Li,Wenping Wang,Rong Xie,Li Song
2024-09-27
Abstract:In this paper, we introduce a novel text-to-avatar generation method that separately generates the human body and the clothes and allows high-quality animation on the generated avatar. While recent advancements in text-to-avatar generation have yielded diverse human avatars from text prompts, these methods typically combine all elements-clothes, hair, and body-into a single 3D representation. Such an entangled approach poses challenges for downstream tasks like editing or animation. To overcome these limitations, we propose a novel disentangled 3D avatar representation named Sequentially Offset-SMPL (SO-SMPL), building upon the SMPL model. SO-SMPL represents the human body and clothes with two separate meshes but associates them with offsets to ensure the physical alignment between the body and the clothes. Then, we design a Score Distillation Sampling (SDS)-based distillation framework to generate the proposed SO-SMPL representation from text prompts. Our approach not only achieves higher texture and geometry quality and better semantic alignment with text prompts, but also significantly improves the visual quality of character animation, virtual try-on, and avatar editing. Project page: <a class="link-external link-https" href="https://shanemankiw.github.io/SO-SMPL/" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address the problem of generating high-quality and animatable decoupled avatars based on textual descriptions. Specifically, most existing studies, when generating avatars from text prompts, typically merge all elements such as clothes, hair, and body into a single 3D representation, which poses challenges for subsequent editing or animation tasks. Therefore, this paper proposes a novel approach, Sequentially Offset-SMPL (SO-SMPL), which generates separate body and clothing meshes based on the SMPL model. This method not only improves texture and geometric quality but also significantly enhances the visual effects of character animation, virtual try-on, and avatar editing. By using the Score Distillation Sampling (SDS) framework, the method can generate decoupled SO-SMPL representations from text prompts. Experiments show that the avatars generated by this method have higher texture and geometric detail quality and better semantic alignment with the text prompts. Additionally, the generated avatars exhibit highly realistic animation effects in animation simulation environments.