DreamIdentity: Improved Editability for Efficient Face-identity Preserved Image Generation

Zhuowei Chen,Shancheng Fang,Wei Liu,Qian He,Mengqi Huang,Yongdong Zhang,Zhendong Mao

2023-07-01

Abstract:While large-scale pre-trained text-to-image models can synthesize diverse and high-quality human-centric images, an intractable problem is how to preserve the face identity for conditioned face images. Existing methods either require time-consuming optimization for each face-identity or learning an efficient encoder at the cost of harming the editability of models. In this work, we present an optimization-free method for each face identity, meanwhile keeping the editability for text-to-image models. Specifically, we propose a novel face-identity encoder to learn an accurate representation of human faces, which applies multi-scale face features followed by a multi-embedding projector to directly generate the pseudo words in the text embedding space. Besides, we propose self-augmented editability learning to enhance the editability of models, which is achieved by constructing paired generated face and edited face images using celebrity names, aiming at transferring mature ability of off-the-shelf text-to-image models in celebrity faces to unseen faces. Extensive experiments show that our methods can generate identity-preserved images under different scenes at a much faster speed.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper aims to address the issue of facial identity preservation in Text-to-Image (T2I) generation. Specifically, the researchers propose a new method that can efficiently generate new images that match the identity of a given facial image while allowing for diverse variations based on different textual descriptions. The main contributions of the paper include: 1. **Proposing the DreamIdentity framework**: This is an optimization-free method that can quickly generate images consistent with the input facial identity and can be flexibly edited based on text prompts. 2. **Multi-word Multi-scale Identity Encoder (M2ID Encoder)**: To accurately represent facial identity information, the researchers designed an identity encoder based on the visual Transformer architecture. It can extract features from different scales and map these features into multiple word embeddings to obtain a more refined identity representation. 3. **Self-enhancing Editability Learning**: To address the inconsistency between training and testing, the paper proposes a new method that trains the model by constructing a self-enhancing dataset to improve its editing capabilities. This dataset is built using existing T2I models to generate celebrity faces and their variant images, aiming to teach the model how to edit the input face based on text prompts. Experimental results show that DreamIdentity achieves high-quality text-guided editing while preserving facial identity, and it is faster and more effective compared to existing methods. Additionally, the paper discusses some limitations of the method, such as limited handling capability for low-quality or out-of-domain images.

DreamIdentity: Improved Editability for Efficient Face-identity Preserved Image Generation

FaceChain: A Playground for Identity-Preserving Portrait Generation

DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation

StableIdentity: Inserting Anybody into Anywhere at First Sight

MasterWeaver: Taming Editability and Face Identity for Personalized Text-to-Image Generation

Imagine yourself: Tuning-Free Personalized Image Generation

A Data Perspective on Enhanced Identity Preservation for Diffusion Personalization

DreamArtist: Towards Controllable One-Shot Text-to-Image Generation via Positive-Negative Prompt-Tuning

FaceStudio: Put Your Face Everywhere in Seconds

DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance

DreamBlend: Advancing Personalized Fine-tuning of Text-to-Image Diffusion Models

FlashFace: Human Image Personalization with High-fidelity Identity Preservation

DreamSteerer: Enhancing Source Image Conditioned Editability using Personalized Diffusion Models

Fusion is all you need: Face Fusion for Customized Identity-Preserving Image Synthesis

ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning

AttnDreamBooth: Towards Text-Aligned Personalized Text-to-Image Generation

Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models

Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm

HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

Identity-Aware and Shape-Aware Propagation of Face Editing in Videos