SuperLoRA: Parameter-Efficient Unified Adaptation of Multi-Layer Attention Modules

Xiangyu Chen,Jing Liu,Ye Wang,Pu Perry Wang,Matthew Brand,Guanghui Wang,Toshiaki Koike-Akino
2024-03-18
Abstract:Low-rank adaptation (LoRA) and its variants are widely employed in fine-tuning large models, including large language models for natural language processing and diffusion models for computer vision. This paper proposes a generalized framework called SuperLoRA that unifies and extends different LoRA variants, which can be realized under different hyper-parameter settings. Introducing grouping, folding, shuffling, projecting, and tensor factoring, SuperLoRA offers high flexibility compared with other LoRA variants and demonstrates superior performance for transfer learning tasks especially in the extremely few-parameter regimes.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper primarily aims to address the issues of excessive resource consumption and high data requirements in large neural network models for downstream tasks, particularly for vision tasks (such as Vision Transformer, ConvNeXt) and natural language processing tasks (such as GPT, PALM2, Gemini, LLaMA2). To tackle these problems, the authors propose a new parameter-efficient fine-tuning framework—SuperLoRA. The goal of SuperLoRA is to unify and extend different low-rank adaptation (LoRA) variants and provide a more flexible approach to adjusting the weight updates of different attention modules. Specifically, SuperLoRA introduces mechanisms such as grouping, folding, shuffling, projection, and tensor decomposition, which can demonstrate superior transfer learning performance with an extremely small number of parameters. The key contributions of the paper are as follows: 1. **Proposing the SuperLoRA framework**: This is a new parameter-efficient fine-tuning framework that can unify and extend most LoRA variants. 2. **Parameter-efficient weight updates**: Through projected tensor rank decomposition, SuperLoRA can jointly adapt all weights across layers while providing a wide range of adjustable parameter amounts. 3. **Investigating the impact of various techniques**: Including tensor reshaping, grouping, random projection, and shuffling on performance. 4. **Empirical results**: Demonstrating high parameter efficiency of SuperLoRA on two transfer learning tasks (image classification and image generation) for large vision Transformers and diffusion models. 5. **Significant parameter reduction**: Achieving 3 to 10 times reduction in parameter amounts. Through these contributions, SuperLoRA provides a general framework for existing LoRA variants while achieving better performance and higher parameter efficiency in practical applications.