DiffLoRA: Generating Personalized Low-Rank Adaptation Weights with Diffusion

Yujia Wu,Yiming Shi,Jiwei Wei,Chengwei Sun,Yuyang Zhou,Yang Yang,Heng Tao Shen
2024-08-18
Abstract:Personalized text-to-image generation has gained significant attention for its capability to generate high-fidelity portraits of specific identities conditioned on user-defined prompts. Existing methods typically involve test-time fine-tuning or instead incorporating an additional pre-trained branch. However, these approaches struggle to simultaneously address the demands of efficiency, identity fidelity, and preserving the model's original generative capabilities. In this paper, we propose DiffLoRA, a novel approach that leverages diffusion models as a hypernetwork to predict personalized low-rank adaptation (LoRA) weights based on the reference images. By integrating these LoRA weights into the text-to-image model, DiffLoRA achieves personalization during inference without further training. Additionally, we propose an identity-oriented LoRA weight construction pipeline to facilitate the training of DiffLoRA. By utilizing the dataset produced by this pipeline, our DiffLoRA consistently generates high-performance and accurate LoRA weights. Extensive evaluations demonstrate the effectiveness of our method, achieving both time efficiency and maintaining identity fidelity throughout the personalization process.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve The paper aims to address several key issues in personalized text-to-image generation: 1. **Efficiency**: Existing personalized methods typically require large datasets and long training times, resulting in each personalization process taking 10 to 30 minutes, which is impractical for user-centric applications. 2. **Identity Fidelity**: Existing methods often struggle to maintain the identity fidelity of the generated images during personalization, especially when generating high-resolution images. 3. **Preservation of the Model's Original Generative Capability**: Some methods achieve personalization by adding extra trainable branches, but these methods often compromise the model's original generative capability and flexibility. To tackle these issues, the paper proposes a new method called DiffLoRA. DiffLoRA leverages a diffusion model as a hypernetwork to predict personalized low-rank adaptation (LoRA) weights based on reference images and integrates these weights into the text-to-image model, enabling personalization during the inference stage without further training. Additionally, the paper introduces an identity-oriented LoRA weight construction pipeline to support DiffLoRA's training. Through this framework, DiffLoRA can improve the quality and efficiency of generated images while maintaining identity fidelity.