LoRA ensembles for large language model fine-tuning

Xi Wang,Laurence Aitchison,Maja Rudolph

2023-10-05

Abstract:Finetuned LLMs often exhibit poor uncertainty quantification, manifesting as overconfidence, poor calibration, and unreliable prediction results on test data or out-of-distribution samples. One approach commonly used in vision for alleviating this issue is a deep ensemble, which constructs an ensemble by training the same model multiple times using different random initializations. However, there is a huge challenge to ensembling LLMs: the most effective LLMs are very, very large. Keeping a single LLM in memory is already challenging enough: keeping an ensemble of e.g. 5 LLMs in memory is impossible in many settings. To address these issues, we propose an ensemble approach using Low-Rank Adapters (LoRA), a parameter-efficient fine-tuning technique. Critically, these low-rank adapters represent a very small number of parameters, orders of magnitude less than the underlying pre-trained model. Thus, it is possible to construct large ensembles of LoRA adapters with almost the same computational overhead as using the original model. We find that LoRA ensembles, applied on its own or on top of pre-existing regularization techniques, gives consistent improvements in predictive accuracy and uncertainty quantification.

Machine Learning,Artificial Intelligence

What problem does this paper attempt to address?

The paper aims to address the issue of poor uncertainty quantification that arises after fine-tuning large language models (LLMs), specifically manifesting as overconfidence, poor calibration, and unreliable predictions on test data or out-of-distribution samples. To tackle this problem, the paper proposes an ensemble method based on Low-Rank Adapters (LoRA). Traditionally, deep ensembles (i.e., training the same model multiple times with different random initializations) can mitigate this issue of poor uncertainty quantification, but this approach faces significant storage challenges when dealing with very large LLMs. Since a single LLM is already difficult to maintain in memory, keeping an ensemble of multiple LLMs is often impossible. LoRA, as a parameter-efficient fine-tuning technique, requires only a small number of parameters, making it feasible to construct large LoRA ensembles with computational overhead nearly equivalent to using the original model. Experimental results show that LoRA ensembles exhibit significant improvements in both prediction accuracy and uncertainty quantification, whether applied alone or in combination with other regularization techniques. Additionally, the paper explores the impact of regularization on LoRA ensembles and finds that appropriate regularization strategies can further enhance prediction accuracy and calibration ability.

LoRA ensembles for large language model fine-tuning

Uncertainty quantification in fine-tuned LLMs using LoRA ensembles

LoRA$^2$ : Multi-Scale Low-Rank Approximations for Fine-Tuning Large Language Models

LoRA-Ensemble: Efficient Uncertainty Modelling for Self-attention Networks

Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs

LoRA-Mini : Adaptation Matrices Decomposition and Selective Training

Chain of LoRA: Efficient Fine-tuning of Language Models via Residual Learning

LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models

LaMDA: Large Model Fine-Tuning via Spectrally Decomposed Low-Dimensional Adaptation

ALLoRA: Adaptive Learning Rate Mitigates LoRA Fatal Flaws

Learning on LoRAs: GL-Equivariant Processing of Low-Rank Weight Spaces for Large Finetuned Models

GeLoRA: Geometric Adaptive Ranks For Efficient LoRA Fine-tuning

LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

LoRTA: Low Rank Tensor Adaptation of Large Language Models

LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning

MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning

Bayesian Low-rank Adaptation for Large Language Models

Delta-LoRA: Fine-Tuning High-Rank Parameters with the Delta of Low-Rank Matrices

OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning

LoRA Learns Less and Forgets Less