Abstract:In this paper, we propose a novel hybrid representation and end-to-end trainable network architecture to model fully editable and customizable neural avatars. At the core of our work lies a representation that combines the modeling power of neural fields with the ease of use and inherent 3D consistency of skinned meshes. To this end, we construct a trainable feature codebook to store local geometry and texture features on the vertices of a deformable body model, thus exploiting its consistent topology under articulation. This representation is then employed in a generative auto-decoder architecture that admits fitting to unseen scans and sampling of realistic avatars with varied appearances and geometries. Furthermore, our representation allows local editing by swapping local features between 3D assets. To verify our method for avatar creation and editing, we contribute a new high-quality dataset, dubbed CustomHumans, for training and evaluation. Our experiments quantitatively and qualitatively show that our method generates diverse detailed avatars and achieves better model fitting performance compared to state-of-the-art methods. Our code and dataset are available at <a class="link-external link-https" href="https://custom-humans.github.io/" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The paper aims to address the problem of creating editable and customizable neural avatars (3D virtual characters), specifically to achieve the following points: 1. **Local Editing Capability**: Enable users to easily transfer partial geometric details and appearance features between different 3D assets, such as changing clothes or modifying patterns on garments. 2. **Customizable Details**: Allow users to directly draw and customize clothing details through 2D to 3D conversion, such as adding logos and letters. 3. **Consistent Local Details**: Ensure that the generated avatars maintain consistent local details when changing poses. To achieve these goals, the authors propose a novel hybrid representation and an end-to-end trainable network architecture that combines the modeling capabilities of neural fields with the usability and inherent 3D consistency of skinned meshes. Specifically, they construct a trainable feature codebook to store local geometric and texture features on the vertices of a deformed human model. This representation is used in a generative auto-decoder architecture that can adapt to unseen scan data and generate realistic avatars with diverse appearances and geometries. Additionally, this representation supports local editing by swapping local features. To validate the proposed method's capability in creating and editing avatars, the authors contribute a new high-quality dataset (named CustomHumans) for training and evaluation. Experimental results demonstrate both quantitatively and qualitatively that their method can generate diverse and detailed avatars and outperforms existing techniques in model fitting performance. In summary, the main contributions of this paper include: - A novel hybrid representation that supports local editing across subjects. - A generative pipeline for creating 3D avatars that can adapt to unseen 3D scans and random sampling. - A new large-scale high-quality 3D human scan dataset containing diverse subjects, body poses, and clothing.

Learning Locally Editable Virtual Humans

NECA: Neural Customizable Human Avatar

Relightable and Animatable Neural Avatars from Videos

MetaAvatar: Learning Animatable Clothed Human Models from Few Depth Images

HR Human: Modeling Human Avatars with Triangular Mesh and High-Resolution Textures from Videos

GAN-Avatar: Controllable Personalized GAN-based Human Head Avatar

AttriHuman-3D: Editable 3D Human Avatar Generation with Attribute Decomposition and Indexing

HDHumans: A Hybrid Approach for High-fidelity Digital Humans

X-Avatar: Expressive Human Avatars

High-Fidelity Human Avatars from a Single RGB Camera

HQ3DAvatar: High Quality Controllable 3D Head Avatar

AG3D: Learning to Generate 3D Avatars from 2D Image Collections

AvatarStudio: Text-driven Editing of 3D Dynamic Human Head Avatars

Learning Disentangled Avatars with Hybrid 3D Representations

Neural Head Avatars from Monocular RGB Videos

Animatable Implicit Neural Representations for Creating Realistic Avatars from Videos

Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos

SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks

HINT: Learning Complete Human Neural Representations from Limited Viewpoints

Detailed Human Avatars from Monocular Video

Deformable 3D Gaussian Splatting for Animatable Human Avatars