FouRA: Fourier Low Rank Adaptation

Shubhankar Borse,Shreya Kadambi,Nilesh Prasad Pandey,Kartikeya Bhardwaj,Viswanath Ganapathy,Sweta Priyadarshi,Risheek Garrepalli,Rafael Esteves,Munawar Hayat,Fatih Porikli
2024-06-13
Abstract:While Low-Rank Adaptation (LoRA) has proven beneficial for efficiently fine-tuning large models, LoRA fine-tuned text-to-image diffusion models lack diversity in the generated images, as the model tends to copy data from the observed training samples. This effect becomes more pronounced at higher values of adapter strength and for adapters with higher ranks which are fine-tuned on smaller datasets. To address these challenges, we present FouRA, a novel low-rank method that learns projections in the Fourier domain along with learning a flexible input-dependent adapter rank selection strategy. Through extensive experiments and analysis, we show that FouRA successfully solves the problems related to data copying and distribution collapse while significantly improving the generated image quality. We demonstrate that FouRA enhances the generalization of fine-tuned models thanks to its adaptive rank selection. We further show that the learned projections in the frequency domain are decorrelated and prove effective when merging multiple adapters. While FouRA is motivated for vision tasks, we also demonstrate its merits for language tasks on the GLUE benchmark.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper primarily investigates the issues encountered by Low-Rank Adaptation (LoRA) in text-to-image generation tasks. Specifically, although LoRA can effectively fine-tune large models, when applied to text-to-image diffusion models, the generated images lack diversity, and the model tends to replicate data from the training samples. This phenomenon is particularly evident when the adapter strength is high or when fine-tuning high-rank adapters on smaller datasets. Additionally, as the rank of the adapter decreases, the model's ability to generate diverse images also diminishes. To address the aforementioned challenges, the paper proposes FouRA (Fourier Low Rank Adaptation), a novel low-rank method that learns projections in the Fourier domain and employs a flexible input-dependent adapter rank selection strategy. Through extensive experiments and analysis, the paper demonstrates that FouRA successfully resolves the issues of data replication and distribution collapse while significantly improving the quality of the generated images. FouRA enhances the generalization ability of the fine-tuned model through adaptive rank selection and proves that learning projections in the frequency domain is decorrelated, making it highly effective when merging multiple adapters. Although FouRA was initially designed for visual tasks, it also shows its advantages in language tasks, such as its performance on the GLUE benchmark. In summary, FouRA aims to overcome the limitations of LoRA in text-to-image generation tasks by improving the quality and diversity of generated images through low-rank adaptation in the Fourier domain, and it demonstrates good generalization performance.