MagicFace: High-Fidelity Facial Expression Editing with Action-Unit Control

Mengting Wei,Tuomas Varanka,Xingxun Jiang,Huai-Qian Khor,Guoying Zhao
2025-01-04
Abstract:We address the problem of facial expression editing by controling the relative variation of facial action-unit (AU) from the same person. This enables us to edit this specific person's expression in a fine-grained, continuous and interpretable manner, while preserving their identity, pose, background and detailed facial attributes. Key to our model, which we dub MagicFace, is a diffusion model conditioned on AU variations and an ID encoder to preserve facial details of high consistency. Specifically, to preserve the facial details with the input identity, we leverage the power of pretrained Stable-Diffusion models and design an ID encoder to merge appearance features through self-attention. To keep background and pose consistency, we introduce an efficient Attribute Controller by explicitly informing the model of current background and pose of the target. By injecting AU variations into a denoising UNet, our model can animate arbitrary identities with various AU combinations, yielding superior results in high-fidelity expression editing compared to other facial expression editing works. Code is publicly available at <a class="link-external link-https" href="https://github.com/weimengting/MagicFace" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to edit facial expressions in a fine - grained, continuous, and interpretable manner while maintaining the identity, pose, background, and other facial details. Specifically, the authors propose a model named MagicFace to achieve this goal by controlling the changes of facial Action Units (AU). ### Core of the Problem 1. **Challenges in Facial Expression Editing**: - Existing methods usually use latent space or 3DMM parameters to represent facial expressions, but these methods are difficult to provide intuitive and interpretable expressions. - There is a lack of precise control over specific Action Units (AU), resulting in the inability to flexibly adjust the intensity and position of expressions. 2. **Limitations of Existing Methods**: - Generative models such as GANs can generate high - quality images, but often lead to the loss of identity information or background changes when editing facial expressions. - Other methods based on diffusion models can generate high - quality images, but cannot support flexible expression editing. ### MagicFace's Solution MagicFace solves the above problems in the following ways: - **Action Unit (AU) as a Condition**: Use the changes of AU to represent and control facial expressions, allowing users to intuitively adjust the intensity of specific action units. - **ID Encoder**: Introduce an ID encoder to retain the identity features of the input image, ensuring that the edited image still maintains the original identity information. - **Attribute Controller**: Design an attribute controller to maintain the consistency of the background and pose, avoiding changing these attributes during the editing process. - **Diffusion Model**: Based on the pre - trained Stable - Diffusion model, generate high - fidelity editing results by injecting AU changes. ### Experimental Verification To verify the effectiveness of MagicFace, the authors conducted a large number of experiments, including quantitative and qualitative evaluations. The results show that MagicFace is superior to other existing methods in terms of AU accuracy, identity preservation, background preservation, and pose preservation. ### Summary The main contribution of this paper is to propose a new facial expression editing method, MagicFace, which can edit facial expressions in a fine - grained, continuous, and interpretable manner while maintaining identity, pose, and background. By using AU changes as a condition and combining an ID encoder and an attribute controller, MagicFace significantly improves the quality and flexibility of facial expression editing. ### Formula Summary - **AU Change Formula**: \[ c_{\text{AU}} = c_{\text{ID}} - c_{\text{tgt}} \] where \(c_{\text{ID}}\) is the AU intensity of the source image, and \(c_{\text{tgt}}\) is the AU intensity of the target image. - **Loss Function**: \[ L = \mathbb{E}_{z_t, c, \epsilon, t} \left[ \left\| \epsilon - \epsilon_\theta(z_t, c, t) \right\|_2^2 \right] \] where \(\epsilon\) is noise, \(c\) is the conditional embedding, and \(t\) is the time step. - **Conditional Noise Prediction Formula**: \[ \hat{\epsilon}_\theta(z_t, c_{\text{AU}}) = \epsilon_\theta(z_t, \emptyset) + \alpha \cdot (\epsilon_\theta(z_t, c_{\text{AU}}) - \epsilon_\theta(z_t, \emptyset)) \] where \(\alpha\) is the guidance scale parameter. Through these technical means, MagicFace achieves more refined and controllable facial expression editing.