Abstract:Recent advances in text-guided 3D avatar generation have made substantial progress by distilling knowledge from diffusion models. Despite the plausible generated appearance, existing methods cannot achieve fine-grained disentanglement or high-fidelity modeling between inner body and outfit. In this paper, we propose Barbie, a novel framework for generating 3D avatars that can be dressed in diverse and high-quality Barbie-like garments and accessories. Instead of relying on a holistic model, Barbie achieves fine-grained disentanglement on avatars by semantic-aligned separated models for human body and outfits. These disentangled 3D representations are then optimized by different expert models to guarantee the domain-specific fidelity. To balance geometry diversity and reasonableness, we propose a series of losses for template-preserving and human-prior evolving. The final avatar is enhanced by unified texture refinement for superior texture consistency. Extensive experiments demonstrate that Barbie outperforms existing methods in both dressed human and outfit generation, supporting flexible apparel combination and animation. The code will be released for research purposes. Our project page is: <a class="link-external link-https" href="https://xiaokunsun.github.io/Barbie.github.io/" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve several key problems in the current 3D virtual avatar generation: 1. **Fine - grained Decoupling and High - Fidelity Modeling**: - Existing methods cannot achieve fine - grained disentanglement between the human body and clothing when generating 3D virtual avatars, resulting in insufficient details in generated clothing and accessories and difficulty in flexible combination. - The paper proposes a new framework named Barbie. By using separate models with semantic alignment to process the human body and clothing respectively, it achieves fine - grained decoupling and ensures domain - specific realism. 2. **Geometric Diversity and Rationality**: - To balance geometric diversity and rationality, the paper introduces a series of loss functions, such as template - preserving loss and human - prior evolving loss, to ensure that the generated 3D virtual avatars are both diverse and conform to human morphological characteristics. 3. **High - Quality Texture Consistency**: - In the final virtual avatar synthesis, unified texture refinement is used to enhance the texture consistency generated by different expert models, ensuring a more realistic overall appearance. 4. **Flexibility and Composability**: - The 3D virtual avatars generated by the Barbie framework support flexible clothing combinations and animation production. Users can freely match different clothing and accessories as needed, similar to the design concept of Barbie dolls. ### Main Contributions of the Framework - **Innovative Generation Framework**: Barbie is the first work to achieve fine - grained text - to - 3D virtual avatar generation, and it can generate highly decoupled human bodies, clothing, and accessories. - **Application of Expert Models**: By applying domain - specific expert diffusion models at different optimization stages, the realism of the generated content in terms of geometry and texture is improved. - **Novel Loss Functions and Strategies**: Multiple new loss functions and optimization strategies are proposed to solve the geometric and texture conflict problems that may occur when combining different expert models. ### Experimental Results Through extensive experimental verification, Barbie significantly outperforms existing methods in virtual avatar and clothing generation, showing better geometric structure, texture quality, text - description consistency, and fine - grained decoupling ability. As shown in Table 1 specifically, Barbie achieves the best or second - best results on multiple evaluation criteria. ### Summary The Barbie framework solves the deficiencies in the existing 3D virtual avatar generation through fine - grained decoupling, high - quality texture optimization, and flexible clothing combinations, providing new ideas and technical means for future research and applications.

Barbie: Text to Barbie-Style 3D Avatars

Disentangled Clothed Avatar Generation from Text Descriptions

Text2Avatar: Text to 3D Human Avatar Generation with Codebook-Driven Body Controllable Attribute

AvatarStudio: High-fidelity and Animatable 3D Avatar Creation from Text

AvatarBooth: High-Quality and Customizable 3D Human Avatar Generation

SEEAvatar: Photorealistic Text-to-3D Avatar Generation with Constrained Geometry and Appearance

DAGSM: Disentangled Avatar Generation with GS-enhanced Mesh

AvatarVerse: High-quality & Stable 3D Avatar Creation from Text and Pose

PuzzleAvatar: Assembling 3D Avatars from Personal Albums

DreamWaltz: Make a Scene with Complex 3D Animatable Avatars

GETAvatar: Generative Textured Meshes for Animatable Human Avatars

AG3D: Learning to Generate 3D Avatars from 2D Image Collections

TADA! Text to Animatable Digital Avatars

X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation

XAGen: 3D Expressive Human Avatars Generation

AvatarCLIP: zero-shot text-driven generation and animation of 3D avatars

GGAvatar: Reconstructing Garment-Separated 3D Gaussian Splatting Avatars from Monocular Video

AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars Using 2D Diffusion

Dressing Avatars: Deep Photorealistic Appearance for Physically Simulated Clothing

GenCA: A Text-conditioned Generative Model for Realistic and Drivable Codec Avatars

DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models