Revealing Directions for Text-guided 3D Face Editing

Zhuo Chen,Yichao Yan,Sehngqi Liu,Yuhao Cheng,Weiming Zhao,Lincheng Li,Mengxiao Bi,Xiaokang Yang
2024-10-07
Abstract:3D face editing is a significant task in multimedia, aimed at the manipulation of 3D face models across various control signals. The success of 3D-aware GAN provides expressive 3D models learned from 2D single-view images only, encouraging researchers to discover semantic editing directions in its latent space. However, previous methods face challenges in balancing quality, efficiency, and generalization. To solve the problem, we explore the possibility of introducing the strength of diffusion model into 3D-aware GANs. In this paper, we present Face Clan, a fast and text-general approach for generating and manipulating 3D faces based on arbitrary attribute descriptions. To achieve disentangled editing, we propose to diffuse on the latent space under a pair of opposite prompts to estimate the mask indicating the region of interest on latent codes. Based on the mask, we then apply denoising to the masked latent codes to reveal the editing direction. Our method offers a precisely controllable manipulation method, allowing users to intuitively customize regions of interest with the text description. Experiments demonstrate the effectiveness and generalization of our Face Clan for various pre-trained GANs. It offers an intuitive and wide application for text-guided face editing that contributes to the landscape of multimedia content creation.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problems that this paper attempts to solve are the challenges in 3D face editing. Specifically: 1. **Balancing quality, efficiency and generalization ability**: Previous methods in 3D face editing have difficulty simultaneously ensuring high - quality editing effects, efficient editing speed and wide applicability. The author points out that supervised methods require a large amount of labeled data and are time - consuming, while unsupervised methods are identity - sensitive and it is difficult to find any semantic directions required by users. 2. **Precise control and identity preservation**: During the editing process, how to change only the target area (such as texture or geometric features) without affecting other parts and maintain the consistency of the original identity is a key issue. Existing methods perform poorly when dealing with complex attributes (such as hairstyles, hats, etc.), especially in color and texture editing. To solve these problems, the author proposes a fast and general text - guided 3D face editing method based on the diffusion model - **Face Clan**. This method is achieved through the following steps: - **Introducing the diffusion model**: Apply the diffusion model to the latent space of GAN to align the distribution of text conditions and latent codes. The diffusion model enhances the diversity and consistency of the text - to - latent - space mapping through multi - step cumulative deviations. - **Estimating the direction mask**: By analyzing the difference in predicted noise under a pair of opposite descriptions (such as "wearing a hat" and "not wearing a hat"), estimate a mask to indicate the region of interest in the latent code. This allows the editing operation to focus on a specific area while preserving the rest. - **Denoising operation**: Apply the denoising process in the mask area to reveal the editing direction. This method allows users to intuitively customize the area of interest according to the text description, achieving precisely controllable editing. Experimental results show that Face Clan performs well on a variety of pre - trained GANs and can achieve high - quality, efficient and widely applicable text - guided 3D face editing, especially suitable for the field of multimedia content creation.