MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Ting Jiang,Shaohan Huang,Shengyue Luo,Zihan Zhang,Haizhen Huang,Furu Wei,Weiwei Deng,Feng Sun,Qi Zhang,Deqing Wang,Fuzhen Zhuang
2024-05-20
Abstract:Low-rank adaptation is a popular parameter-efficient fine-tuning method for large language models. In this paper, we analyze the impact of low-rank updating, as implemented in LoRA. Our findings suggest that the low-rank updating mechanism may limit the ability of LLMs to effectively learn and memorize new knowledge. Inspired by this observation, we propose a new method called MoRA, which employs a square matrix to achieve high-rank updating while maintaining the same number of trainable parameters. To achieve it, we introduce the corresponding non-parameter operators to reduce the input dimension and increase the output dimension for the square matrix. Furthermore, these operators ensure that the weight can be merged back into LLMs, which makes our method can be deployed like LoRA. We perform a comprehensive evaluation of our method across five tasks: instruction tuning, mathematical reasoning, continual pretraining, memory and pretraining. Our method outperforms LoRA on memory-intensive tasks and achieves comparable performance on other tasks.
Computation and Language,Machine Learning
What problem does this paper attempt to address?
The paper mainly discusses the impact of Low-Rank Adaptation (LoRA) on Large Language Models (LLMs) in the Parameter-Efficient Fine-Tuning (PEFT) method. LoRA reduces the memory required for training by updating parameters with low-rank matrices, but the authors found that this method may limit the ability of LLMs to learn and memorize new knowledge. To address this issue, they propose a new method called Matrix-based High-Rank Updating (MoRA), which uses square matrices to achieve high-rank updates while maintaining the same number of trainable parameters as LoRA. By introducing non-parametric operators, MoRA can reduce the input dimensionality and increase the output dimensionality, ensuring that the weights can be merged back into LLMs just like LoRA. MoRA is extensively evaluated on various tasks, including instruction fine-tuning, mathematical reasoning, continual pretraining, and memory tasks. The results show that MoRA outperforms LoRA on memory-intensive tasks and performs similarly to LoRA on other tasks. The paper also analyzes different types of fine-tuning tasks, such as instruction fine-tuning focusing on format interaction rather than knowledge acquisition, while mathematical reasoning and continual pretraining require enhanced knowledge and capabilities. The experiments demonstrate that low-rank updates have limitations in memory-intensive tasks, and MoRA overcomes this issue by using high-rank updates. The paper also discusses related work on LoRA and other PEFT methods, as well as how to improve LoRA by increasing the rank of the matrix. Finally, by comparing the performance of LoRA and MoRA on different tasks, the effectiveness of high-rank updates is proven.