Abstract:3D human generation from 2D images has achieved remarkable progress through the synergistic utilization of neural rendering and generative models. Existing 3D human generative models mainly generate a clothed 3D human as an undetectable 3D model in a single pass, while rarely considering the layer-wise nature of a clothed human body, which often consists of the human body and various clothes such as underwear, outerwear, trousers, shoes, etc. In this work, we propose HumanLiff, the first layer-wise 3D human generative model with a unified diffusion process. Specifically, HumanLiff firstly generates minimal-clothed humans, represented by tri-plane features, in a canonical space, and then progressively generates clothes in a layer-wise manner. In this way, the 3D human generation is thus formulated as a sequence of diffusion-based 3D conditional generation. To reconstruct more fine-grained 3D humans with tri-plane representation, we propose a tri-plane shift operation that splits each tri-plane into three sub-planes and shifts these sub-planes to enable feature grid subdivision. To further enhance the controllability of 3D generation with 3D layered conditions, HumanLiff hierarchically fuses tri-plane features and 3D layered conditions to facilitate the 3D diffusion model learning. Extensive experiments on two layer-wise 3D human datasets, SynBody (synthetic) and TightCap (real-world), validate that HumanLiff significantly outperforms state-of-the-art methods in layer-wise 3D human generation. Our code will be available at <a class="link-external link-https" href="https://skhu101.github.io/HumanLiff" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to more finely control the generation of different levels of clothing when generating 3D clothed humans. Existing 3D human generation models usually generate a human model wearing complete clothing all at once, and rarely consider the layering of human clothing, such as underwear, outerwear, pants, shoes, etc. This one - time generation method has certain limitations when users hope to control the generation process of each level. For example, in virtual reality (VR) or augmented reality (AR) applications, users may want to create game characters layer by layer, first generate a basically clothed human body, and then gradually select or generate pants, tops, and shoes, etc. To solve this problem, the paper proposes **HumanLiff**, which is the first hierarchical 3D human generation model using the Diffusion Model. The main contributions of HumanLiff are as follows: 1. **Hierarchical 3D human generation**: Through the diffusion model, HumanLiff can generate the human body and its various layers of clothing step by step, allowing users to freely control the generation process of the human body and each layer of clothing. 2. **Tri - plane representation and tri - plane shift operation**: In order to reconstruct a more detailed 3D human model, the paper proposes a tri - plane representation (Tri - plane Representation) and a tri - plane shift operation (Tri - plane Shift). The tri - plane representation represents information in 3D space through three vertical planes, and the tri - plane shift operation divides each tri - plane into three sub - planes and moves these sub - planes, so that 3D points projected onto the same area can extract different features, thereby improving the detail expressiveness of the model. 3. **3D conditional fusion**: In order to better control the 3D generation process, HumanLiff fuses multi - scale 3D conditional features with the output of the diffusion UNet decoder layer by layer through a 3D conditional UNet encoder, ensuring the retention of the information of the previous layer of clothing during the generation process. Through these innovations, the experimental results of HumanLiff on two hierarchical 3D human datasets - SynBody (synthetic dataset) and TightCap (real - world dataset) show that it is significantly superior to existing 3D GAN and diffusion model methods in the hierarchical 3D human generation task. This not only promotes the development of 3D human generation technology, but also provides new possibilities for personalized and interactive 3D content creation in practical applications.

HumanLiff: Layer-wise 3D Human Generation with Diffusion Model

HumanCoser: Layered 3D Human Generation via Semantic-Aware Diffusion Model

StructLDM: Structured Latent Diffusion for 3D Human Generation

HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation

TELA: Text to Layer-wise 3D Clothed Human Generation

Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models

HumanRef: Single Image to 3D Human Generation via Reference-Guided Diffusion

Human-Aware 3D Scene Generation with Spatially-constrained Diffusion Models

Gen-3Diffusion: Realistic Image-to-3D Generation via 2D & 3D Diffusion Synergy

PrimDiffusion: Volumetric Primitives Diffusion for 3D Human Generation

3D Clothed Human Body Generation Method Based on Inter-Frame Motion Prediction of 2D Images

Human-VDM: Learning Single-Image 3D Human Gaussian Splatting from Video Diffusion Models

MVHuman: Tailoring 2D Diffusion with Multi-view Sampling For Realistic 3D Human Generation

Chupa: Carving 3D Clothed Humans from Skinned Shape Priors using 2D Diffusion Probabilistic Models

New Fashion: Personalized 3D Design with a Single Sketch Input

Morphable Diffusion: 3D-Consistent Diffusion for Single-image Avatar Creation

Get3DHuman: Lifting StyleGAN-Human into a 3D Generative Model using Pixel-aligned Reconstruction Priors

AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars Using 2D Diffusion

Diffusion-HPC: Generating Synthetic Images with Realistic Humans

3D-Aware Semantic-Guided Generative Model for Human Synthesis.