Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation

Renshuai Liu,Bowen Ma,Wei Zhang,Zhipeng Hu,Changjie Fan,Tangjie Lv,Yu Ding,Xuan Cheng

2024-04-07

Abstract:In human-centric content generation, the pre-trained text-to-image models struggle to produce user-wanted portrait images, which retain the identity of individuals while exhibiting diverse expressions. This paper introduces our efforts towards personalized face generation. To this end, we propose a novel multi-modal face generation framework, capable of simultaneous identity-expression control and more fine-grained expression synthesis. Our expression control is so sophisticated that it can be specialized by the fine-grained emotional vocabulary. We devise a novel diffusion model that can undertake the task of simultaneously face swapping and reenactment. Due to the entanglement of identity and expression, it's nontrivial to separately and precisely control them in one framework, thus has not been explored yet. To overcome this, we propose several innovative designs in the conditional diffusion model, including balancing identity and expression encoder, improved midpoint sampling, and explicitly background conditioning. Extensive experiments have demonstrated the controllability and scalability of the proposed framework, in comparison with state-of-the-art text-to-image, face swapping, and face reenactment methods.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

This paper attempts to solve the problem of simultaneously controlling identity and expression in personalized face generation. Specifically, existing pre - trained text - to - image models have difficulty retaining individual identity characteristics while showing diverse expressions when generating portrait pictures that meet user requirements. In addition, the granularity of expression control in existing methods is still relatively coarse, usually limited to seven to eight common labels (such as "surprised", "happy", "angry", etc.), which cannot fully cover the entire emotional space in the open world. To overcome these problems, the paper proposes a new multi - modal face generation framework that can achieve simultaneous control of identity and expression and finer - grained expression synthesis. The core technology of this framework is a novel diffusion model that can perform simultaneous face swapping and reenactment tasks (Simultaneous Face Swapping and Reenactment, SFSR). By introducing balanced identity and expression encoders, an improved mid - point sampling method, and an explicit background condition design, this model improves the quality and controllability of the generated images while maintaining high customizability. In summary, the main contributions of the paper are: - Proposing a new face generation framework that achieves simultaneous control of identity and expression and finer - grained expression synthesis. - Defining a new face manipulation task - simultaneous face swapping and reenactment, which has not been explored by previous methods. - Proposing three innovative designs in the conditional diffusion model, which increase the controllability of the model and the image quality.

Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation

Multimodal-driven Talking Face Generation, Face Swapping, Diffusion Model

Expression Conditional Gan for Facial Expression-to-Expression Translation.

FaceChain: A Playground for Identity-Preserving Portrait Generation

EmojiDiff: Advanced Facial Expression Control with High Identity Preservation in Portrait Generation

DisControlFace: Disentangled Control for Personalized Facial Image Editing

Multimodal-driven Talking Face Generation via a Unified Diffusion-based Generator

Low tissue gastrin content in the ovine distal duodenum is associated with increased percentage of G34.

Towards Localized Fine-Grained Control for Facial Expression Generation

FaceStudio: Put Your Face Everywhere in Seconds

ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning

FaceChain: A Playground for Human-centric Artificial Intelligence Generated Content

ControlFace: Feature Disentangling for Controllable Face Swapping.

Controllable Image-to-Video Translation: A Case Study on Facial Expression Generation

Identity-Guided Face Generation with Multi-Modal Contour Conditions

MyPortrait: Morphable Prior-Guided Personalized Portrait Generation

StableIdentity: Inserting Anybody into Anywhere at First Sight

FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio

RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network

Controllable 3D Face Generation with Conditional Style Code Diffusion