Abstract:We show that variational learning can significantly improve the accuracy and calibration of Low-Rank Adaptation (LoRA) without a substantial increase in the cost. We replace AdamW by the Improved Variational Online Newton (IVON) algorithm to finetune large language models. For Llama-2 with 7 billion parameters, IVON improves the accuracy over AdamW by 2.8% and expected calibration error by 4.6%. The accuracy is also better than the other Bayesian alternatives, yet the cost is lower and the implementation is easier. Our work provides additional evidence for the effectiveness of IVON for large language models. The code is available at <a class="link-external link-https" href="https://github.com/team-approx-bayes/ivon-lora" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: How can the variational Bayesian learning method significantly improve the accuracy and calibration performance of low - rank adaptation (LoRA) for fine - tuning large language models (LLM) without significantly increasing the computational cost? Specifically, the paper points out that when fine - tuning large - scale language models, although Bayesian methods can theoretically improve accuracy and calibration performance, they are often accompanied by high computational costs in practical applications. Especially when using Low - Rank Adaptation (LoRA) for fine - tuning, many Bayesian variant methods (such as SWAG - LoRA, LoRA ensemble, Laplace - LoRA, etc.) require additional computational resources to estimate the posterior distribution or approximate the Hessian matrix. To solve this problem, the author proposes a method based on the improved variational online Newton (IVON) algorithm, which can directly replace the commonly used AdamW optimizer. IVON not only achieves almost the same implementation as AdamW, but also, by introducing the variational Bayesian learning framework, can significantly improve the model's accuracy and calibration performance with almost no increase in computational cost. Specific improvements include: 1. **Accuracy improvement**: For the Llama - 2 model with 7 billion parameters, on multiple common - sense reasoning tasks, IVON improves the accuracy by 2.8% compared to AdamW. 2. **Calibration performance improvement**: IVON reduces the expected calibration error (ECE) by 4.6% and also shows better calibration performance among other Bayesian methods. 3. **Low cost and easy implementation**: The implementation of IVON only requires modifying a few lines of code, and its computational overhead is very small (approximately 1% of the total training time), making this method easy to apply in practice. In conclusion, this paper shows how, by introducing the IVON algorithm, the performance of LoRA for fine - tuning large language models can be significantly improved without significantly increasing the computational cost. This provides new ideas and evidence for future research, indicating the effectiveness of variational Bayesian methods in large - scale language models.

Variational Low-Rank Adaptation Using IVON

Variational Learning is Effective for Large Deep Networks

OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models

Bayesian Low-rank Adaptation for Large Language Models

AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning

VeRA: Vector-based Random Matrix Adaptation

AdvLoRA: Adversarial Low-Rank Adaptation of Vision-Language Models

LoRA ensembles for large language model fine-tuning

Gaussian Stochastic Weight Averaging for Bayesian Low-Rank Adaptation of Large Language Models

LoRA$^2$ : Multi-Scale Low-Rank Approximations for Fine-Tuning Large Language Models

Delta-LoRA: Fine-Tuning High-Rank Parameters with the Delta of Low-Rank Matrices

One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation

A Note on LoRA

A Bayesian Interpretation of Adaptive Low-Rank Adaptation

LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization

Learning on LoRAs: GL-Equivariant Processing of Low-Rank Weight Spaces for Large Finetuned Models

LoRA+: Efficient Low Rank Adaptation of Large Models

BLoB: Bayesian Low-Rank Adaptation by Backpropagation for Large Language Models

LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

Initialization using Update Approximation is a Silver Bullet for Extremely Efficient Low-Rank Fine-Tuning

Robust and Efficient Fine-tuning of LLMs with Bayesian Reparameterization of Low-Rank Adaptation