Abstract:Estimating the uncertainty of responses of Large Language Models~(LLMs) remains a critical challenge. While recent Bayesian methods have demonstrated effectiveness in quantifying uncertainty through low-rank weight updates, they typically require complex fine-tuning or post-training procedures. In this paper, we propose Training-Free Bayesianization~(TFB), a novel framework that transforms existing off-the-shelf trained LoRA adapters into Bayesian ones without additional training. TFB systematically searches for the maximally acceptable level of variance in the weight posterior, constrained within a family of low-rank isotropic Gaussian distributions. We theoretically demonstrate that under mild conditions, this search process is equivalent to variational inference for the weights. Through comprehensive experiments, we show that TFB achieves superior uncertainty estimation and generalization compared to existing methods while eliminating the need for complex training procedures. Code will be available at <a class="link-external link-https" href="https://github.com/Wang-ML-Lab/bayesian-peft" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the uncertainty estimation problem in the responses of large - language models (LLMs). Specifically, although low - rank adapters (LoRA) are very effective in adapting large - language models to perform new tasks, they themselves do not provide a mechanism to systematically estimate uncertainty. Although existing Bayesian methods can quantify uncertainty through low - rank weight updates, they usually require complex fine - tuning or post - training processes. Therefore, the key challenge faced by researchers is how to convert existing, off - the - shelf LoRA adapters into Bayesian - natured adapters without additional training, thereby achieving effective uncertainty estimation. ### Main contributions of the paper 1. **Proposing Training - Free Bayesianization (TFB)**: This is the first framework that can convert a trained LoRA adapter into a Bayesian adapter without retraining, continued training, or even gradient estimation. 2. **Theoretical connection**: Establishing a theoretical connection between TFB and variational inference (VI), proving that under mild conditions, the maximum variance search of TFB is equivalent to variational free - energy minimization. 3. **Efficient implementation**: Developing an efficient implementation method that only requires an anchored dataset for the search, making it widely applicable in different fields and various LoRA variants. 4. **Experimental verification**: Verifying the effectiveness of TFB in multiple settings, datasets, LLM backbones, LoRA weights, and LoRA variants through extensive experiments, proving that it is superior to existing methods in terms of uncertainty estimation and generalization ability. ### Method overview The core idea of TFB is to constrain the weight posterior distribution of the LoRA adapter to a low - rank isotropic Gaussian distribution and systematically search for the maximum acceptable variance of the weight posterior distribution. The specific steps include: 1. **Low - rank isotropic Gaussian posterior distribution**: Restrict the weight posterior distribution to a more compact Gaussian distribution family, that is, project the full - space isotropic Gaussian distribution onto the low - rank space. 2. **Determining the maximum variance**: Determine the maximum value of the standard deviation \(\sigma_q\) of the weight posterior distribution through an optimization problem, ensuring that the performance degradation is within an acceptable range. 3. **Algorithm implementation**: Determine the optimal \(\sigma_q\) through binary search or grid search, and use this value to Bayesianize all LoRA layers. ### Theoretical analysis The paper also provides theoretical analysis, proving that under specific conditions, the process of TFB is equivalent to variational inference. This provides a solid theoretical foundation for TFB, while also ensuring its practicality and simplicity. In conclusion, this paper solves the key problem of Bayesianizing the LoRA adapter of large - language models without additional training by proposing Training - Free Bayesianization (TFB), thereby achieving more accurate uncertainty estimation and better generalization ability.

Training-Free Bayesianization for Low-Rank Adapters of Large Language Models

BLoB: Bayesian Low-Rank Adaptation by Backpropagation for Large Language Models

Bayesian Low-rank Adaptation for Large Language Models

Adaptive Feature-based Low-Rank Compression of Large Language Models Via Bayesian Optimization

Gaussian Stochastic Weight Averaging for Bayesian Low-Rank Adaptation of Large Language Models

Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization

Robust and Efficient Fine-tuning of LLMs with Bayesian Reparameterization of Low-Rank Adaptation

LoRA ensembles for large language model fine-tuning

&Lt;inline-Formula> &Lt;tex-Math Notation="latex">$l_{1}$ &Lt;/tex-Math></inline-formula>-norm Low-Rank Matrix Factorization by Variational Bayesian Method

Bayesian-LoRA: LoRA based Parameter Efficient Fine-Tuning using Optimal Quantization levels and Rank Values trough Differentiable Bayesian Gates

Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance

Personalizing Low-Rank Bayesian Neural Networks Via Federated Learning

BayesAdapter: Being Bayesian, Inexpensively and Reliably, via Bayesian Fine-tuning

Bayesian Concept Bottleneck Models with LLM Priors

Large Language Models to Enhance Bayesian Optimization

Bayesian Reward Models for LLM Alignment

BA-LoRA: Bias-Alleviating Low-Rank Adaptation to Mitigate Catastrophic Inheritance in Large Language Models

BIRD: A Trustworthy Bayesian Inference Framework for Large Language Models

AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning