Bai Cong,Nico Daheim,Yuesong Shen,Daniel Cremers,Rio Yokota,Mohammad Emtiyaz Khan,Thomas Möllenhoff
Abstract:We show that variational learning can significantly improve the accuracy and calibration of Low-Rank Adaptation (LoRA) without a substantial increase in the cost. We replace AdamW by the Improved Variational Online Newton (IVON) algorithm to finetune large language models. For Llama-2 with 7 billion parameters, IVON improves the accuracy over AdamW by 2.8% and expected calibration error by 4.6%. The accuracy is also better than the other Bayesian alternatives, yet the cost is lower and the implementation is easier. Our work provides additional evidence for the effectiveness of IVON for large language models. The code is available at <a class="link-external link-https" href="https://github.com/team-approx-bayes/ivon-lora" rel="external noopener nofollow">this https URL</a>.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: How can the variational Bayesian learning method significantly improve the accuracy and calibration performance of low - rank adaptation (LoRA) for fine - tuning large language models (LLM) without significantly increasing the computational cost?
Specifically, the paper points out that when fine - tuning large - scale language models, although Bayesian methods can theoretically improve accuracy and calibration performance, they are often accompanied by high computational costs in practical applications. Especially when using Low - Rank Adaptation (LoRA) for fine - tuning, many Bayesian variant methods (such as SWAG - LoRA, LoRA ensemble, Laplace - LoRA, etc.) require additional computational resources to estimate the posterior distribution or approximate the Hessian matrix.
To solve this problem, the author proposes a method based on the improved variational online Newton (IVON) algorithm, which can directly replace the commonly used AdamW optimizer. IVON not only achieves almost the same implementation as AdamW, but also, by introducing the variational Bayesian learning framework, can significantly improve the model's accuracy and calibration performance with almost no increase in computational cost. Specific improvements include:
1. **Accuracy improvement**: For the Llama - 2 model with 7 billion parameters, on multiple common - sense reasoning tasks, IVON improves the accuracy by 2.8% compared to AdamW.
2. **Calibration performance improvement**: IVON reduces the expected calibration error (ECE) by 4.6% and also shows better calibration performance among other Bayesian methods.
3. **Low cost and easy implementation**: The implementation of IVON only requires modifying a few lines of code, and its computational overhead is very small (approximately 1% of the total training time), making this method easy to apply in practice.
In conclusion, this paper shows how, by introducing the IVON algorithm, the performance of LoRA for fine - tuning large language models can be significantly improved without significantly increasing the computational cost. This provides new ideas and evidence for future research, indicating the effectiveness of variational Bayesian methods in large - scale language models.