ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning

Weifeng Chen,Jiacheng Zhang,Jie Wu,Hefeng Wu,Xuefeng Xiao,Liang Lin

2024-04-24

Abstract:The rapid development of diffusion models has triggered diverse applications. Identity-preserving text-to-image generation (ID-T2I) particularly has received significant attention due to its wide range of application scenarios like AI portrait and advertising. While existing ID-T2I methods have demonstrated impressive results, several key challenges remain: (1) It is hard to maintain the identity characteristics of reference portraits accurately, (2) The generated images lack aesthetic appeal especially while enforcing identity retention, and (3) There is a limitation that cannot be compatible with LoRA-based and Adapter-based methods simultaneously. To address these issues, we present \textbf{ID-Aligner}, a general feedback learning framework to enhance ID-T2I performance. To resolve identity features lost, we introduce identity consistency reward fine-tuning to utilize the feedback from face detection and recognition models to improve generated identity preservation. Furthermore, we propose identity aesthetic reward fine-tuning leveraging rewards from human-annotated preference data and automatically constructed feedback on character structure generation to provide aesthetic tuning signals. Thanks to its universal feedback fine-tuning framework, our method can be readily applied to both LoRA and Adapter models, achieving consistent performance gains. Extensive experiments on SD1.5 and SDXL diffusion models validate the effectiveness of our approach. \textbf{Project Page: \url{

Computer Vision and Pattern Recognition,Artificial Intelligence

What problem does this paper attempt to address?

The paper proposes a framework called ID-Aligner aiming to enhance the performance of maintaining identity in text-to-image generation through feedback learning. Existing methods face challenges in accurately preserving personal features, improving the aesthetic appeal of generated images, and compatibility with LoRA and Adapter methods. ID-Aligner addresses these issues by introducing identity consistency rewards and identity aesthetic rewards to enhance the identity consistency and visual appeal of generated characters. This approach can be applied to LoRA and Adapter models, and experiments show that it outperforms existing methods in terms of identity preservation and aesthetic quality.

ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning

FaceChain: A Playground for Identity-Preserving Portrait Generation

Beyond Inserting: Learning Identity Embedding for Semantic-Fidelity Personalized Diffusion Generation

Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm

MasterWeaver: Taming Editability and Face Identity for Personalized Text-to-Image Generation

IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models

Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences

Improving Long-Text Alignment for Text-to-Image Diffusion Models

Inv-Adapter: ID Customization Generation via Image Inversion and Lightweight Adapter

A Dense Reward View on Aligning Text-to-Image Diffusion with Preference

ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

PersonalVideo: High ID-Fidelity Video Customization without Dynamic and Semantic Degradation

A Data Perspective on Enhanced Identity Preservation for Diffusion Personalization

RestorerID: Towards Tuning-Free Face Restoration with ID Preservation

Information Theoretic Text-to-Image Alignment

Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning

PreciseControl: Enhancing Text-To-Image Diffusion Models with Fine-Grained Attribute Control

Bridging the Gap: Aligning Text-to-Image Diffusion Models with Specific Feedback

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

EmojiDiff: Advanced Facial Expression Control with High Identity Preservation in Portrait Generation

Magic-Me: Identity-Specific Video Customized Diffusion