Text-Driven Diverse Facial Texture Generation via Progressive Latent-Space Refinement

Chi Wang,Junming Huang,Rong Zhang,Qi Wang,Haotian Yang,Haibin Huang,Chongyang Ma,Weiwei Xu
2024-04-15
Abstract:Automatic 3D facial texture generation has gained significant interest recently. Existing approaches may not support the traditional physically based rendering pipeline or rely on 3D data captured by Light Stage. Our key contribution is a progressive latent space refinement approach that can bootstrap from 3D Morphable Models (3DMMs)-based texture maps generated from facial images to generate high-quality and diverse PBR textures, including albedo, normal, and roughness. It starts with enhancing Generative Adversarial Networks (GANs) for text-guided and diverse texture generation. To this end, we design a self-supervised paradigm to overcome the reliance on ground truth 3D textures and train the generative model with only entangled texture maps. Besides, we foster mutual enhancement between GANs and Score Distillation Sampling (SDS). SDS boosts GANs with more generative modes, while GANs promote more efficient optimization of SDS. Furthermore, we introduce an edge-aware SDS for multi-view consistent facial structure. Experiments demonstrate that our method outperforms existing 3D texture generation methods regarding photo-realistic quality, diversity, and efficiency.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to automatically generate high - quality and diverse facial PBR (Physically Based Rendering) textures given text prompts. Existing methods may not support the traditional physically - based rendering process, or rely on 3D data captured by devices such as Light Stage. These methods face challenges in generating decoupled UV texture maps (such as diffuse maps, roughness maps, and normal maps), which are crucial for achieving realistic rendering effects. In addition, existing methods also face difficulties in handling specific text prompts or generating diverse textures. To this end, this paper proposes a method based on progressive latent space refinement, aiming to generate high - quality and diverse PBR textures starting from texture maps generated by 3D Morphable Models (3DMMs). Specifically, this method realizes text - guided diverse texture generation through Enhanced Generative Adversarial Networks (GANs), and designs a self - supervised paradigm to overcome the dependence on real 3D textures. At the same time, this method promotes the mutual enhancement between GANs and Score Distillation Sampling (SDS), where SDS improves GANs by providing more generation modes, and GANs promotes SDS through a more efficient optimization process. In addition, Edge - Aware SDS (EASDS) is introduced to maintain the consistency of facial structures from multiple perspectives. Experimental results show that this method is superior to existing 3D texture generation methods in terms of photo - realistic quality, diversity, and efficiency.