Abstract:We address the problem of facial expression editing by controling the relative variation of facial action-unit (AU) from the same person. This enables us to edit this specific person's expression in a fine-grained, continuous and interpretable manner, while preserving their identity, pose, background and detailed facial attributes. Key to our model, which we dub MagicFace, is a diffusion model conditioned on AU variations and an ID encoder to preserve facial details of high consistency. Specifically, to preserve the facial details with the input identity, we leverage the power of pretrained Stable-Diffusion models and design an ID encoder to merge appearance features through self-attention. To keep background and pose consistency, we introduce an efficient Attribute Controller by explicitly informing the model of current background and pose of the target. By injecting AU variations into a denoising UNet, our model can animate arbitrary identities with various AU combinations, yielding superior results in high-fidelity expression editing compared to other facial expression editing works. Code is publicly available at <a class="link-external link-https" href="https://github.com/weimengting/MagicFace" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to edit facial expressions in a fine - grained, continuous, and interpretable manner while maintaining the identity, pose, background, and other facial details. Specifically, the authors propose a model named MagicFace to achieve this goal by controlling the changes of facial Action Units (AU). ### Core of the Problem 1. **Challenges in Facial Expression Editing**: - Existing methods usually use latent space or 3DMM parameters to represent facial expressions, but these methods are difficult to provide intuitive and interpretable expressions. - There is a lack of precise control over specific Action Units (AU), resulting in the inability to flexibly adjust the intensity and position of expressions. 2. **Limitations of Existing Methods**: - Generative models such as GANs can generate high - quality images, but often lead to the loss of identity information or background changes when editing facial expressions. - Other methods based on diffusion models can generate high - quality images, but cannot support flexible expression editing. ### MagicFace's Solution MagicFace solves the above problems in the following ways: - **Action Unit (AU) as a Condition**: Use the changes of AU to represent and control facial expressions, allowing users to intuitively adjust the intensity of specific action units. - **ID Encoder**: Introduce an ID encoder to retain the identity features of the input image, ensuring that the edited image still maintains the original identity information. - **Attribute Controller**: Design an attribute controller to maintain the consistency of the background and pose, avoiding changing these attributes during the editing process. - **Diffusion Model**: Based on the pre - trained Stable - Diffusion model, generate high - fidelity editing results by injecting AU changes. ### Experimental Verification To verify the effectiveness of MagicFace, the authors conducted a large number of experiments, including quantitative and qualitative evaluations. The results show that MagicFace is superior to other existing methods in terms of AU accuracy, identity preservation, background preservation, and pose preservation. ### Summary The main contribution of this paper is to propose a new facial expression editing method, MagicFace, which can edit facial expressions in a fine - grained, continuous, and interpretable manner while maintaining identity, pose, and background. By using AU changes as a condition and combining an ID encoder and an attribute controller, MagicFace significantly improves the quality and flexibility of facial expression editing. ### Formula Summary - **AU Change Formula**: \[ c_{\text{AU}} = c_{\text{ID}} - c_{\text{tgt}} \] where \(c_{\text{ID}}\) is the AU intensity of the source image, and \(c_{\text{tgt}}\) is the AU intensity of the target image. - **Loss Function**: \[ L = \mathbb{E}_{z_t, c, \epsilon, t} \left[ \left\| \epsilon - \epsilon_\theta(z_t, c, t) \right\|_2^2 \right] \] where \(\epsilon\) is noise, \(c\) is the conditional embedding, and \(t\) is the time step. - **Conditional Noise Prediction Formula**: \[ \hat{\epsilon}_\theta(z_t, c_{\text{AU}}) = \epsilon_\theta(z_t, \emptyset) + \alpha \cdot (\epsilon_\theta(z_t, c_{\text{AU}}) - \epsilon_\theta(z_t, \emptyset)) \] where \(\alpha\) is the guidance scale parameter. Through these technical means, MagicFace achieves more refined and controllable facial expression editing.

MagicFace: High-Fidelity Facial Expression Editing with Action-Unit Control

APB2FaceV2: Real-Time Audio-Guided Multi-Face Reenactment

FReeNet: Multi-Identity Face Reenactment

Real-Time Audio-Guided Multi-Face Reenactment

Toward Fine-grained Facial Expression Manipulation

Designing One Unified Framework for High-Fidelity Face Reenactment and Swapping

FaceController: Controllable Attribute Editing for Face in the Wild

Facial Action Units Detection Aided by Global-Local Expression Embedding

Towards Localized Fine-Grained Control for Facial Expression Generation

MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion

DisControlFace: Adding Disentangled Control to Diffusion Autoencoder for One-shot Explicit Facial Image Editing

AUEditNet: Dual-Branch Facial Action Unit Intensity Manipulation with Implicit Disentanglement

DiffFAE: Advancing High-fidelity One-shot Facial Appearance Editing with Space-sensitive Customization and Semantic Preservation

Controllable high-fidelity facial performance transfer

MIMAFace: Face Animation via Motion-Identity Modulated Appearance Feature Learning

AniFaceDiff: Animating Stylized Avatars via Parametric Conditioned Diffusion Models

Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control

Expressive 3D Facial Animation Generation Based on Local-to-Global Latent Diffusion

High-Fidelity Face Manipulation With Extreme Poses and Expressions

Global-to-local Expression-aware Embeddings for Facial Action Unit Detection