ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

Jiehui Huang,Xiao Dong,Wenhui Song,Hanhui Li,Jun Zhou,Yuhao Cheng,Shutao Liao,Long Chen,Yiqiang Yan,Shengcai Liao,Xiaodan Liang

2024-04-26

Abstract:Diffusion-based technologies have made significant strides, particularly in personalized and customized facialgeneration. However, existing methods face challenges in achieving high-fidelity and detailed identity (ID)consistency, primarily due to insufficient fine-grained control over facial areas and the lack of a comprehensive strategy for ID preservation by fully considering intricate facial details and the overall face. To address these limitations, we introduce ConsistentID, an innovative method crafted for diverseidentity-preserving portrait generation under fine-grained multimodal facial prompts, utilizing only a single reference image. ConsistentID comprises two key components: a multimodal facial prompt generator that combines facial features, corresponding facial descriptions and the overall facial context to enhance precision in facial details, and an ID-preservation network optimized through the facial attention localization strategy, aimed at preserving ID consistency in facial regions. Together, these components significantly enhance the accuracy of ID preservation by introducing fine-grained multimodal ID information from facial regions. To facilitate training of ConsistentID, we present a fine-grained portrait dataset, FGID, with over 500,000 facial images, offering greater diversity and comprehensiveness than existing public facial datasets. % such as LAION-Face, CelebA, FFHQ, and SFHQ. Experimental results substantiate that our ConsistentID achieves exceptional precision and diversity in personalized facial generation, surpassing existing methods in the MyStyle dataset. Furthermore, while ConsistentID introduces more multimodal ID information, it maintains a fast inference speed during generation.

Computer Vision and Pattern Recognition,Artificial Intelligence

What problem does this paper attempt to address?

The problem addressed in this paper is the consistency of identity and fidelity of facial details in personalized portrait generation. Existing methods struggle to achieve high fidelity and detailed identity consistency when generating diverse portraits with identity preservation based on a single reference image. To tackle this, the paper proposes the ConsistentID method, which includes a fine-grained multi-modal facial prompt generator and an identity preservation network. It enhances the details by integrating facial features, descriptions, and global context, and maintains the identity consistency of facial regions through attention-based localization strategies. Additionally, the paper creates a fine-grained portrait dataset called FGID to facilitate training and provide a more comprehensive evaluation.

ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

FaceChain: A Playground for Identity-Preserving Portrait Generation

UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization

MagicID: Flexible ID Fidelity Generation System

Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation

InstantID: Zero-shot Identity-Preserving Generation in Seconds

InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation

Identity-Guided Face Generation with Multi-Modal Contour Conditions

FaceStudio: Put Your Face Everywhere in Seconds

EmojiDiff: Advanced Facial Expression Control with High Identity Preservation in Portrait Generation

Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm

Identity‐consistent transfer learning of portraits for digital apparel sample display

ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning

IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models

PersonaCraft: Personalized Full-Body Image Synthesis for Multiple Identities from Single References Using 3D-Model-Conditioned Diffusion

FaceChain-FACT: Face Adapter with Decoupled Training for Identity-preserved Personalization

Enhancing the Authenticity of Rendered Portraits with Identity-Consistent Transfer Learning

Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model

StableIdentity: Inserting Anybody into Anywhere at First Sight

FlashFace: Human Image Personalization with High-fidelity Identity Preservation