Abstract:In many high-risk machine learning applications it is essential for a model to indicate when it is uncertain about a prediction. While large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, their overconfidence in incorrect responses is still a well-documented failure mode. Traditional methods for ML uncertainty quantification can be difficult to directly adapt to LLMs due to the computational cost of implementation and closed-source nature of many models. A variety of black-box methods have recently been proposed, but these often rely on heuristics such as self-verbalized confidence. We instead propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer. While utilizing explanations is not a new idea in and of itself, by interpreting each possible model+explanation pair as a test-time classifier we can calculate a posterior answer distribution over the most likely of these classifiers. We demonstrate how a specific instance of this framework using explanation entailment as our classifier likelihood improves confidence score metrics (in particular AURC and AUROC) over baselines across five different datasets. We believe these results indicate that our framework is both a well-principled and effective way of quantifying uncertainty in LLMs.

What problem does this paper attempt to address?

### The Problem the Paper Attempts to Solve The paper attempts to address the issue of uncertainty quantification in large language models (LLMs) during prediction. Although LLMs can achieve or even surpass human-level accuracy in various benchmarks, their overconfidence in incorrect responses remains a known failure mode. Traditional machine learning uncertainty quantification methods are difficult to apply directly to LLMs because these methods usually require high computational costs, and many models are closed-source. Existing black-box methods, while making some progress, still rely on heuristic approaches such as self-reported confidence. Therefore, this paper proposes a new framework to measure the uncertainty of LLMs by generating a distribution of explanations. ### Specific Problem Description 1. **Overconfidence in LLMs**: LLMs can sometimes provide incorrect answers with high confidence, which may mislead non-expert users. 2. **Limitations of Existing Methods**: Traditional methods are difficult to apply directly to LLMs because they usually require access to the model's internal information, and many LLMs are closed-source. Existing black-box methods, although not relying on internal information, still suffer from the problem of overconfidence. 3. **Changes in Data Distribution**: Existing uncertainty quantification methods usually assume that test data and training data come from the same distribution, but in practical applications, this assumption often does not hold, leading to poor performance of uncertainty quantification strategies on new data. ### Solution This paper proposes a method based on generating a distribution of explanations to measure the uncertainty of LLMs. Specifically, the method is implemented through the following steps: 1. **Generate Explanations**: For a given question, sample a set of explanations from the LLM. 2. **Calculate Logical Consistency of Explanations**: Use the LLM to calculate the probability of logical consistency between each explanation and the question. 3. **Reweight Explanations**: Reweight the explanations based on their logical consistency probabilities. 4. **Calculate Posterior Probability of Answers**: Based on the reweighted explanations, calculate the posterior probability of each answer. Through this method, the authors hope to more accurately quantify the uncertainty of LLMs across different datasets, especially on complex questions. Experimental results show that this method outperforms existing baseline methods on multiple datasets, particularly in selective uncertainty tasks.

Cycles of Thought: Measuring LLM Confidence through Stable Explanations

Quantifying Uncertainty in Natural Language Explanations of Large Language Models

Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs

The Calibration Gap between Model and Human Confidence in Large Language Models

Large Language Model Confidence Estimation via Black-Box Access

"Why Should You Trust My Explanation?" Understanding Uncertainty in LIME Explanations

Confidence in the Reasoning of Large Language Models

On Verbalized Confidence Scores for LLMs

To Believe or Not to Believe Your LLM

Confidence Under the Hood: An Investigation into the Confidence-Probability Alignment in Large Language Models

UBENCH: Benchmarking Uncertainty in Large Language Models with Multiple Choice Questions

Look before you leap: An exploratory study of uncertainty measurement for large language models

Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores

Uncertainty in Language Models: Assessment through Rank-Calibration

Local Explanations and Self-Explanations for Assessing Faithfulness in black-box LLMs

Testing Uncertainty of Large Language Models for Physics Knowledge and Reasoning

Factual Confidence of LLMs: on Reliability and Robustness of Current Estimators

Evaluating Explanations Through LLMs: Beyond Traditional User Studies

Examining LLMs' Uncertainty Expression Towards Questions Outside Parametric Knowledge

Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models