Abstract:It is a common practice in natural language processing to pre-train a single model on a general domain and then fine-tune it for downstream tasks. However, when it comes to Large Language Models, fine-tuning the entire model can be computationally expensive, resulting in very intensive energy consumption. As a result, several Parameter Efficient Fine-Tuning (PEFT) approaches were recently proposed. One of the most popular approaches is low-rank adaptation (LoRA), where the key insight is decomposing the update weights of the pre-trained model into two low-rank matrices. However, the proposed approaches either use the same rank value across all different weight matrices, which has been shown to be a sub-optimal choice, or do not use any quantization technique, one of the most important factors when it comes to a model's energy consumption. In this work, we propose Bayesian-LoRA which approaches low-rank adaptation and quantization from a Bayesian perspective by employing a prior distribution on both quantization levels and rank values. As a result, B-LoRA is able to fine-tune a pre-trained model on a specific downstream task, finding the optimal rank values and quantization levels for every low-rank matrix. We validate the proposed model by fine-tuning a pre-trained DeBERTaV3 on the GLUE benchmark. Moreover, we compare it to relevant baselines and present both qualitative and quantitative results, showing how the proposed approach is able to learn optimal-rank quantized matrices. B-LoRA performs on par with or better than the baselines while reducing the total number of bit operations by roughly 70% compared to the baseline methods.

Bayesian Low-rank Adaptation for Large Language Models

BLoB: Bayesian Low-Rank Adaptation by Backpropagation for Large Language Models

Robust and Efficient Fine-tuning of LLMs with Bayesian Reparameterization of Low-Rank Adaptation

Gaussian Stochastic Weight Averaging for Bayesian Low-Rank Adaptation of Large Language Models

LoRA ensembles for large language model fine-tuning

LoRA$^2$ : Multi-Scale Low-Rank Approximations for Fine-Tuning Large Language Models

BoRA: Bayesian Hierarchical Low-Rank Adaption for Multi-task Large Language Models

OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models

Bayesian Reward Models for LLM Alignment

Training-Free Bayesianization for Low-Rank Adapters of Large Language Models

Chain of LoRA: Efficient Fine-tuning of Language Models via Residual Learning

Delta-LoRA: Fine-Tuning High-Rank Parameters with the Delta of Low-Rank Matrices

Bayesian-LoRA: LoRA based Parameter Efficient Fine-Tuning using Optimal Quantization levels and Rank Values trough Differentiable Bayesian Gates

Adaptive Feature-based Low-Rank Compression of Large Language Models Via Bayesian Optimization

BA-LoRA: Bias-Alleviating Low-Rank Adaptation to Mitigate Catastrophic Inheritance in Large Language Models

SBoRA: Low-Rank Adaptation with Regional Weight Updates

Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs

GeLoRA: Geometric Adaptive Ranks For Efficient LoRA Fine-tuning

A Survey on LoRA of Large Language Models