Abstract:Personalizing a large-scale pretrained Text-to-Image (T2I) diffusion model is challenging as it typically struggles to make an appropriate trade-off between its training data distribution and the target distribution, i.e., learning a novel concept with only a few target images to achieve personalization (aligning with the personalized target) while preserving text editability (aligning with diverse text prompts). In this paper, we propose PaRa, an effective and efficient Parameter Rank Reduction approach for T2I model personalization by explicitly controlling the rank of the diffusion model parameters to restrict its initial diverse generation space into a small and well-balanced target space. Our design is motivated by the fact that taming a T2I model toward a novel concept such as a specific art style implies a small generation space. To this end, by reducing the rank of model parameters during finetuning, we can effectively constrain the space of the denoising sampling trajectories towards the target. With comprehensive experiments, we show that PaRa achieves great advantages over existing finetuning approaches on single/multi-subject generation as well as single-image editing. Notably, compared to the prevailing fine-tuning technique LoRA, PaRa achieves better parameter efficiency (2x fewer learnable parameters) and much better target image alignment.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to address the challenges faced by large - scale pre - trained text - to - image (T2I) diffusion models during the personalization process. Specifically, these models often struggle to make an appropriate trade - off between the training data distribution and the target distribution, that is, to learn new concepts (such as specific artistic styles) with only a few target images while maintaining text - editing capabilities (aligned with diverse text prompts). The paper proposes a method named **PaRa** (Parameter Rank Reduction), which limits its initially diverse generation space by explicitly controlling the rank of the diffusion model parameters, shrinking it to a smaller and balanced target space. ### Main contributions 1. **Proposing the Parameter Rank Reduction (PaRa) framework**: - By explicitly reducing the rank of the diffusion model parameters, effective constraints on the generation space are achieved. - Compared to existing fine - tuning methods (such as LoRA), PaRa has better target - image alignment capabilities and higher parameter efficiency when generating new concepts. 2. **Multi - model combination**: - A simple and effective method is proposed to combine multiple independently fine - tuned PaRa models to achieve multi - subject T2I generation without additional multi - subject image training. 3. **Single - image editing**: - PaRa supports direct single - image - based editing without the need for a noise - inversion process, improving the stability and consistency of editing. ### Experimental results - **Single - subject generation**: - PaRa exhibits higher SSIM scores in single - subject generation tasks, indicating that the generated images have lower diversity but better alignment with the target images. - **Multi - subject generation**: - PaRa can successfully generate images containing multiple subjects without over - emphasizing a certain subject or creating unrealistic mixed entities. - **Single - image editing**: - PaRa shows higher stability in single - image editing tasks, and the generated images have smaller deviations from the original images. ### Method overview 1. **Parameter Rank Reduction (PaRa)**: - By introducing low - rank learnable parameters \( B\in\mathbb{R}^{d\times r} \), the rank of the pre - trained linear projection \( W_0\in\mathbb{R}^{d\times k} \) is reduced. - Using QR decomposition \( B = QR \), calculate \( W_{\text{reduce}}=W_0 - QQ^{T}W_0 \), thereby effectively reducing the dimension of the output space. 2. **Multi - model combination**: - Combine two independently trained PaRa models \( W_1 \) and \( W_2 \) into \( W_m = W_0 - Q_m'Q_m'^{T}W_0 \), where \( Q_m' \) is an orthogonal matrix obtained through QR decomposition. 3. **Single - image editing**: - By controlling the rank in PaRa, a balance can be struck between faithful reconstruction and editability. A larger rank value enhances reconstruction fidelity, and a smaller rank value improves the diversity of the generated images. ### Conclusion PaRa effectively solves the generation - space - constraint problem in the T2I model personalization process by explicitly controlling the rank of the diffusion model parameters. Experimental results show that PaRa performs well in single - subject generation, multi - subject generation, and single - image editing tasks, with high parameter efficiency and generation quality.

PaRa: Personalizing Text-to-Image Diffusion via Parameter Rank Reduction

Block-wise LoRA: Revisiting Fine-grained LoRA for Effective Personalization and Stylization in Text-to-Image Generation

Key-Locked Rank One Editing for Text-to-Image Personalization

SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation

LoRA Diffusion: Zero-Shot LoRA Synthesis for Diffusion Model Personalization

Parameter efficient finetuning of text-to-image models with trainable self-attention layer

Prior Preserved Text-to-Image Personalization Without Image Regularization

StyleInject: Parameter Efficient Tuning of Text-to-Image Diffusion Models

IntLoRA: Integral Low-rank Adaptation of Quantized Diffusion Models

DiffLoRA: Generating Personalized Low-Rank Adaptation Weights with Diffusion

Unlocking the Potential of Text-to-Image Diffusion with PAC-Bayesian Theory

Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

Conditional LoRA Parameter Generation

Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation

PreciseControl: Enhancing Text-To-Image Diffusion Models with Fine-Grained Attribute Control

Do We Need to Design Specific Diffusion Models for Different Tasks? Try ONE-PIC

IncreLoRA: Incremental Parameter Allocation Method for Parameter-Efficient Fine-tuning

ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning

Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models

PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation