Multi-LoRA Composition for Image Generation

Ming Zhong,Yelong Shen,Shuohang Wang,Yadong Lu,Yizhu Jiao,Siru Ouyang,Donghan Yu,Jiawei Han,Weizhu Chen

2024-02-27

Abstract:Low-Rank Adaptation (LoRA) is extensively utilized in text-to-image models for the accurate rendition of specific elements like distinct characters or unique styles in generated images. Nonetheless, existing methods face challenges in effectively composing multiple LoRAs, especially as the number of LoRAs to be integrated grows, thus hindering the creation of complex imagery. In this paper, we study multi-LoRA composition through a decoding-centric perspective. We present two training-free methods: LoRA Switch, which alternates between different LoRAs at each denoising step, and LoRA Composite, which simultaneously incorporates all LoRAs to guide more cohesive image synthesis. To evaluate the proposed approaches, we establish ComposLoRA, a new comprehensive testbed as part of this research. It features a diverse range of LoRA categories with 480 composition sets. Utilizing an evaluation framework based on GPT-4V, our findings demonstrate a clear improvement in performance with our methods over the prevalent baseline, particularly evident when increasing the number of LoRAs in a composition.

Artificial Intelligence,Machine Learning,Computer Vision and Pattern Recognition,Computation and Language,Graphics

What problem does this paper attempt to address?

### The Problem the Paper Attempts to Solve The paper primarily aims to address the challenges faced when combining multiple LoRAs (Low-Rank Adaptations) in text-to-image generation models. Specifically, existing methods perform poorly when integrating multiple LoRAs, especially when more LoRAs are needed, limiting the ability to generate complex images. To solve this problem, the authors propose two training-free methods: LORA SWITCH and LORA COMPOSITE. - **LORA SWITCH**: Guides image generation by alternating between different LoRAs at each denoising step. - **LORA COMPOSITE**: Guides more coherent image synthesis by simultaneously utilizing all LoRAs. To evaluate the effectiveness of these methods, the authors established the ComposLoRA testing platform, which includes various LoRA categories and 480 combinations, and conducted evaluations based on GPT-4V. Experimental results show that these two methods significantly outperform traditional LoRA merging methods when increasing the number of LoRAs. Additionally, the authors conducted detailed manual evaluations to further validate the effectiveness of these methods and analyzed the potential biases of different combination methods and their evaluation frameworks.

Multi-LoRA Composition for Image Generation

CLoRA: A Contrastive Approach to Compose Multiple LoRA Models

MultiLoRA: Democratizing LoRA for Better Multi-Task Learning

LoRA Fusion: Enhancing Image Generation

LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models

LoraRetriever: Input-Aware LoRA Retrieval and Composition for Mixed Tasks in the Wild

In-Context LoRA for Diffusion Transformers

Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering

SuperLoRA: Parameter-Efficient Unified Adaptation of Multi-Layer Attention Modules

HyperLoRA: Efficient Cross-task Generalization Via Constrained Low-Rank Adapters Generation

LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks

LoRA of Change: Learning to Generate LoRA for the Editing Instruction from A Single Before-After Image Pair

Block-wise LoRA: Revisiting Fine-grained LoRA for Effective Personalization and Stylization in Text-to-Image Generation

TriLoRA: Integrating SVD for Advanced Style Personalization in Text-to-Image Generation

LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation

Retrieval-Augmented Mixture of LoRA Experts for Uploadable Machine Learning

IterIS: Iterative Inference-Solving Alignment for LoRA Merging

ResLoRA: Identity Residual Mapping in Low-Rank Adaption

SeLoRA: Self-Expanding Low-Rank Adaptation of Latent Diffusion Model for Medical Image Synthesis

Multimodal Instruction Tuning with Conditional Mixture of LoRA

mLoRA: Fine-Tuning LoRA Adapters via Highly-Efficient Pipeline Parallelism in Multiple GPUs