Cycles of Thought: Measuring LLM Confidence through Stable Explanations

Evan Becker,Stefano Soatto
2024-06-06
Abstract:In many high-risk machine learning applications it is essential for a model to indicate when it is uncertain about a prediction. While large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, their overconfidence in incorrect responses is still a well-documented failure mode. Traditional methods for ML uncertainty quantification can be difficult to directly adapt to LLMs due to the computational cost of implementation and closed-source nature of many models. A variety of black-box methods have recently been proposed, but these often rely on heuristics such as self-verbalized confidence. We instead propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer. While utilizing explanations is not a new idea in and of itself, by interpreting each possible model+explanation pair as a test-time classifier we can calculate a posterior answer distribution over the most likely of these classifiers. We demonstrate how a specific instance of this framework using explanation entailment as our classifier likelihood improves confidence score metrics (in particular AURC and AUROC) over baselines across five different datasets. We believe these results indicate that our framework is both a well-principled and effective way of quantifying uncertainty in LLMs.
Computation and Language,Machine Learning
What problem does this paper attempt to address?
### The Problem the Paper Attempts to Solve The paper attempts to address the issue of uncertainty quantification in large language models (LLMs) during prediction. Although LLMs can achieve or even surpass human-level accuracy in various benchmarks, their overconfidence in incorrect responses remains a known failure mode. Traditional machine learning uncertainty quantification methods are difficult to apply directly to LLMs because these methods usually require high computational costs, and many models are closed-source. Existing black-box methods, while making some progress, still rely on heuristic approaches such as self-reported confidence. Therefore, this paper proposes a new framework to measure the uncertainty of LLMs by generating a distribution of explanations. ### Specific Problem Description 1. **Overconfidence in LLMs**: LLMs can sometimes provide incorrect answers with high confidence, which may mislead non-expert users. 2. **Limitations of Existing Methods**: Traditional methods are difficult to apply directly to LLMs because they usually require access to the model's internal information, and many LLMs are closed-source. Existing black-box methods, although not relying on internal information, still suffer from the problem of overconfidence. 3. **Changes in Data Distribution**: Existing uncertainty quantification methods usually assume that test data and training data come from the same distribution, but in practical applications, this assumption often does not hold, leading to poor performance of uncertainty quantification strategies on new data. ### Solution This paper proposes a method based on generating a distribution of explanations to measure the uncertainty of LLMs. Specifically, the method is implemented through the following steps: 1. **Generate Explanations**: For a given question, sample a set of explanations from the LLM. 2. **Calculate Logical Consistency of Explanations**: Use the LLM to calculate the probability of logical consistency between each explanation and the question. 3. **Reweight Explanations**: Reweight the explanations based on their logical consistency probabilities. 4. **Calculate Posterior Probability of Answers**: Based on the reweighted explanations, calculate the posterior probability of each answer. Through this method, the authors hope to more accurately quantify the uncertainty of LLMs across different datasets, especially on complex questions. Experimental results show that this method outperforms existing baseline methods on multiple datasets, particularly in selective uncertainty tasks.