LoRA ensembles for large language model fine-tuning

Xi Wang,Laurence Aitchison,Maja Rudolph
2023-10-05
Abstract:Finetuned LLMs often exhibit poor uncertainty quantification, manifesting as overconfidence, poor calibration, and unreliable prediction results on test data or out-of-distribution samples. One approach commonly used in vision for alleviating this issue is a deep ensemble, which constructs an ensemble by training the same model multiple times using different random initializations. However, there is a huge challenge to ensembling LLMs: the most effective LLMs are very, very large. Keeping a single LLM in memory is already challenging enough: keeping an ensemble of e.g. 5 LLMs in memory is impossible in many settings. To address these issues, we propose an ensemble approach using Low-Rank Adapters (LoRA), a parameter-efficient fine-tuning technique. Critically, these low-rank adapters represent a very small number of parameters, orders of magnitude less than the underlying pre-trained model. Thus, it is possible to construct large ensembles of LoRA adapters with almost the same computational overhead as using the original model. We find that LoRA ensembles, applied on its own or on top of pre-existing regularization techniques, gives consistent improvements in predictive accuracy and uncertainty quantification.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address the issue of poor uncertainty quantification that arises after fine-tuning large language models (LLMs), specifically manifesting as overconfidence, poor calibration, and unreliable predictions on test data or out-of-distribution samples. To tackle this problem, the paper proposes an ensemble method based on Low-Rank Adapters (LoRA). Traditionally, deep ensembles (i.e., training the same model multiple times with different random initializations) can mitigate this issue of poor uncertainty quantification, but this approach faces significant storage challenges when dealing with very large LLMs. Since a single LLM is already difficult to maintain in memory, keeping an ensemble of multiple LLMs is often impossible. LoRA, as a parameter-efficient fine-tuning technique, requires only a small number of parameters, making it feasible to construct large LoRA ensembles with computational overhead nearly equivalent to using the original model. Experimental results show that LoRA ensembles exhibit significant improvements in both prediction accuracy and uncertainty quantification, whether applied alone or in combination with other regularization techniques. Additionally, the paper explores the impact of regularization on LoRA ensembles and finds that appropriate regularization strategies can further enhance prediction accuracy and calibration ability.