DreamIdentity: Improved Editability for Efficient Face-identity Preserved Image Generation

Zhuowei Chen,Shancheng Fang,Wei Liu,Qian He,Mengqi Huang,Yongdong Zhang,Zhendong Mao
2023-07-01
Abstract:While large-scale pre-trained text-to-image models can synthesize diverse and high-quality human-centric images, an intractable problem is how to preserve the face identity for conditioned face images. Existing methods either require time-consuming optimization for each face-identity or learning an efficient encoder at the cost of harming the editability of models. In this work, we present an optimization-free method for each face identity, meanwhile keeping the editability for text-to-image models. Specifically, we propose a novel face-identity encoder to learn an accurate representation of human faces, which applies multi-scale face features followed by a multi-embedding projector to directly generate the pseudo words in the text embedding space. Besides, we propose self-augmented editability learning to enhance the editability of models, which is achieved by constructing paired generated face and edited face images using celebrity names, aiming at transferring mature ability of off-the-shelf text-to-image models in celebrity faces to unseen faces. Extensive experiments show that our methods can generate identity-preserved images under different scenes at a much faster speed.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the issue of facial identity preservation in Text-to-Image (T2I) generation. Specifically, the researchers propose a new method that can efficiently generate new images that match the identity of a given facial image while allowing for diverse variations based on different textual descriptions. The main contributions of the paper include: 1. **Proposing the DreamIdentity framework**: This is an optimization-free method that can quickly generate images consistent with the input facial identity and can be flexibly edited based on text prompts. 2. **Multi-word Multi-scale Identity Encoder (M2ID Encoder)**: To accurately represent facial identity information, the researchers designed an identity encoder based on the visual Transformer architecture. It can extract features from different scales and map these features into multiple word embeddings to obtain a more refined identity representation. 3. **Self-enhancing Editability Learning**: To address the inconsistency between training and testing, the paper proposes a new method that trains the model by constructing a self-enhancing dataset to improve its editing capabilities. This dataset is built using existing T2I models to generate celebrity faces and their variant images, aiming to teach the model how to edit the input face based on text prompts. Experimental results show that DreamIdentity achieves high-quality text-guided editing while preserving facial identity, and it is faster and more effective compared to existing methods. Additionally, the paper discusses some limitations of the method, such as limited handling capability for low-quality or out-of-domain images.