PaRa: Personalizing Text-to-Image Diffusion via Parameter Rank Reduction

Shangyu Chen,Zizheng Pan,Jianfei Cai,Dinh Phung
2024-06-09
Abstract:Personalizing a large-scale pretrained Text-to-Image (T2I) diffusion model is challenging as it typically struggles to make an appropriate trade-off between its training data distribution and the target distribution, i.e., learning a novel concept with only a few target images to achieve personalization (aligning with the personalized target) while preserving text editability (aligning with diverse text prompts). In this paper, we propose PaRa, an effective and efficient Parameter Rank Reduction approach for T2I model personalization by explicitly controlling the rank of the diffusion model parameters to restrict its initial diverse generation space into a small and well-balanced target space. Our design is motivated by the fact that taming a T2I model toward a novel concept such as a specific art style implies a small generation space. To this end, by reducing the rank of model parameters during finetuning, we can effectively constrain the space of the denoising sampling trajectories towards the target. With comprehensive experiments, we show that PaRa achieves great advantages over existing finetuning approaches on single/multi-subject generation as well as single-image editing. Notably, compared to the prevailing fine-tuning technique LoRA, PaRa achieves better parameter efficiency (2x fewer learnable parameters) and much better target image alignment.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to address the challenges faced by large - scale pre - trained text - to - image (T2I) diffusion models during the personalization process. Specifically, these models often struggle to make an appropriate trade - off between the training data distribution and the target distribution, that is, to learn new concepts (such as specific artistic styles) with only a few target images while maintaining text - editing capabilities (aligned with diverse text prompts). The paper proposes a method named **PaRa** (Parameter Rank Reduction), which limits its initially diverse generation space by explicitly controlling the rank of the diffusion model parameters, shrinking it to a smaller and balanced target space. ### Main contributions 1. **Proposing the Parameter Rank Reduction (PaRa) framework**: - By explicitly reducing the rank of the diffusion model parameters, effective constraints on the generation space are achieved. - Compared to existing fine - tuning methods (such as LoRA), PaRa has better target - image alignment capabilities and higher parameter efficiency when generating new concepts. 2. **Multi - model combination**: - A simple and effective method is proposed to combine multiple independently fine - tuned PaRa models to achieve multi - subject T2I generation without additional multi - subject image training. 3. **Single - image editing**: - PaRa supports direct single - image - based editing without the need for a noise - inversion process, improving the stability and consistency of editing. ### Experimental results - **Single - subject generation**: - PaRa exhibits higher SSIM scores in single - subject generation tasks, indicating that the generated images have lower diversity but better alignment with the target images. - **Multi - subject generation**: - PaRa can successfully generate images containing multiple subjects without over - emphasizing a certain subject or creating unrealistic mixed entities. - **Single - image editing**: - PaRa shows higher stability in single - image editing tasks, and the generated images have smaller deviations from the original images. ### Method overview 1. **Parameter Rank Reduction (PaRa)**: - By introducing low - rank learnable parameters \( B\in\mathbb{R}^{d\times r} \), the rank of the pre - trained linear projection \( W_0\in\mathbb{R}^{d\times k} \) is reduced. - Using QR decomposition \( B = QR \), calculate \( W_{\text{reduce}}=W_0 - QQ^{T}W_0 \), thereby effectively reducing the dimension of the output space. 2. **Multi - model combination**: - Combine two independently trained PaRa models \( W_1 \) and \( W_2 \) into \( W_m = W_0 - Q_m'Q_m'^{T}W_0 \), where \( Q_m' \) is an orthogonal matrix obtained through QR decomposition. 3. **Single - image editing**: - By controlling the rank in PaRa, a balance can be struck between faithful reconstruction and editability. A larger rank value enhances reconstruction fidelity, and a smaller rank value improves the diversity of the generated images. ### Conclusion PaRa effectively solves the generation - space - constraint problem in the T2I model personalization process by explicitly controlling the rank of the diffusion model parameters. Experimental results show that PaRa performs well in single - subject generation, multi - subject generation, and single - image editing tasks, with high parameter efficiency and generation quality.