Learning Locally Editable Virtual Humans

Hsuan-I Ho,Lixin Xue,Jie Song,Otmar Hilliges
2023-04-29
Abstract:In this paper, we propose a novel hybrid representation and end-to-end trainable network architecture to model fully editable and customizable neural avatars. At the core of our work lies a representation that combines the modeling power of neural fields with the ease of use and inherent 3D consistency of skinned meshes. To this end, we construct a trainable feature codebook to store local geometry and texture features on the vertices of a deformable body model, thus exploiting its consistent topology under articulation. This representation is then employed in a generative auto-decoder architecture that admits fitting to unseen scans and sampling of realistic avatars with varied appearances and geometries. Furthermore, our representation allows local editing by swapping local features between 3D assets. To verify our method for avatar creation and editing, we contribute a new high-quality dataset, dubbed CustomHumans, for training and evaluation. Our experiments quantitatively and qualitatively show that our method generates diverse detailed avatars and achieves better model fitting performance compared to state-of-the-art methods. Our code and dataset are available at <a class="link-external link-https" href="https://custom-humans.github.io/" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the problem of creating editable and customizable neural avatars (3D virtual characters), specifically to achieve the following points: 1. **Local Editing Capability**: Enable users to easily transfer partial geometric details and appearance features between different 3D assets, such as changing clothes or modifying patterns on garments. 2. **Customizable Details**: Allow users to directly draw and customize clothing details through 2D to 3D conversion, such as adding logos and letters. 3. **Consistent Local Details**: Ensure that the generated avatars maintain consistent local details when changing poses. To achieve these goals, the authors propose a novel hybrid representation and an end-to-end trainable network architecture that combines the modeling capabilities of neural fields with the usability and inherent 3D consistency of skinned meshes. Specifically, they construct a trainable feature codebook to store local geometric and texture features on the vertices of a deformed human model. This representation is used in a generative auto-decoder architecture that can adapt to unseen scan data and generate realistic avatars with diverse appearances and geometries. Additionally, this representation supports local editing by swapping local features. To validate the proposed method's capability in creating and editing avatars, the authors contribute a new high-quality dataset (named CustomHumans) for training and evaluation. Experimental results demonstrate both quantitatively and qualitatively that their method can generate diverse and detailed avatars and outperforms existing techniques in model fitting performance. In summary, the main contributions of this paper include: - A novel hybrid representation that supports local editing across subjects. - A generative pipeline for creating 3D avatars that can adapt to unseen 3D scans and random sampling. - A new large-scale high-quality 3D human scan dataset containing diverse subjects, body poses, and clothing.