LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization

Jui-Nan Yen,Si Si,Zhao Meng,Felix Yu,Sai Surya Duvvuri,Inderjit S. Dhillon,Cho-Jui Hsieh,Sanjiv Kumar
2024-10-28
Abstract:Low-rank adaption (LoRA) is a widely used parameter-efficient finetuning method for LLM that reduces memory requirements. However, current LoRA optimizers lack transformation invariance, meaning the actual updates to the weights depends on how the two LoRA factors are scaled or rotated. This deficiency leads to inefficient learning and sub-optimal solutions in practice. This paper introduces LoRA-RITE, a novel adaptive matrix preconditioning method for LoRA optimization, which can achieve transformation invariance and remain computationally efficient. We provide theoretical analysis to demonstrate the benefit of our method and conduct experiments on various LLM tasks with different models including Gemma 2B, 7B, and mT5-XXL. The results demonstrate consistent improvements against existing optimizers. For example, replacing Adam with LoRA-RITE during LoRA fine-tuning of Gemma-2B yielded 4.6\% accuracy gain on Super-Natural Instructions and 3.5\% accuracy gain across other four LLM benchmarks (HellaSwag, ArcChallenge, GSM8K, OpenBookQA).
Machine Learning,Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
The paper attempts to address the issue of the lack of transformation invariance in existing optimizers during the Low-Rank Adaptation (LoRA) optimization process, which results in weight updates being dependent on the scaling or rotation of LoRA factors. This deficiency not only leads to mathematical inconsistency but also causes inefficient learning and suboptimal solutions during the training process. Specifically, the paper points out: 1. **Importance of Transformation Invariance**: In LoRA optimization, the same weight update can have multiple decomposition methods. Ideally, the optimizer should produce the same update for these different decomposition methods. However, existing optimizers such as Adam, Adagrad, RMSProp, etc., when applied to LoRA, cannot guarantee this, leading to inefficient training. 2. **Deficiencies of Existing Optimizers**: Through theoretical analysis and experimental validation, the paper demonstrates that existing optimizers lack transformation invariance in LoRA optimization. This not only results in mathematical inconsistency but also causes one LoRA factor to dominate the optimization process while the other factor remains almost unchanged in practical applications. To address these issues, the paper proposes a new optimizer—LoRA-RITE (Robust Invariant Transformation Equilibration). This optimizer achieves efficient computation on the low-rank side while maintaining transformation invariance by introducing a transformation-invariant preprocessing method. Experimental results show that LoRA-RITE significantly outperforms existing optimizers on multiple datasets and models. For example, on the Gemma-2B model, using LoRA-RITE for LoRA fine-tuning improves the accuracy of the Super-Natural Instructions task by 4.6%, and the average accuracy of the other four LLM benchmarks also increases by 3.5%.