Large Language Model Confidence Estimation via Black-Box Access

Tejaswini Pedapati,Amit Dhurandhar,Soumya Ghosh,Soham Dan,Prasanna Sattigeri
2024-10-02
Abstract:Estimating uncertainty or confidence in the responses of a model can be significant in evaluating trust not only in the responses, but also in the model as a whole. In this paper, we explore the problem of estimating confidence for responses of large language models (LLMs) with simply black-box or query access to them. We propose a simple and extensible framework where, we engineer novel features and train a (interpretable) model (viz. logistic regression) on these features to estimate the confidence. We empirically demonstrate that our simple framework is effective in estimating confidence of Flan-ul2, Llama-13b and Mistral-7b on four benchmark Q\&A tasks as well as of Pegasus-large and BART-large on two benchmark summarization tasks with it surpassing baselines by even over $10\%$ (on AUROC) in some cases. Additionally, our interpretable approach provides insight into features that are predictive of confidence, leading to the interesting and useful discovery that our confidence models built for one LLM generalize zero-shot across others on a given dataset.
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to address the issue of confidence estimation in the responses of large language models (LLMs). Specifically, the authors explore methods to estimate the confidence of these models' responses through black-box access or queries. The paper proposes a simple and scalable framework that estimates confidence by designing new features and training an interpretable model (such as a logistic regression model). ### Main Contributions 1. **Proposed a New Framework**: The framework estimates the confidence of large language models by designing new features and training an interpretable model. 2. **Effective Feature Engineering**: The authors proposed 6 different input prompt perturbation strategies that can generate features for confidence estimation. 3. **Extensive Experimental Validation**: Experiments were conducted on multiple benchmark tasks, including 4 question-answering tasks and 2 summarization tasks, demonstrating the effectiveness of the method. 4. **Cross-Model Generalization Ability**: The study found that confidence models built for one LLM can zero-shot generalize to other LLMs, providing the possibility of constructing universal confidence models. ### Method Overview 1. **Prompt Perturbation Strategies**: - **Stochastic Decoding (SD)**: Generate multiple outputs using different decoding strategies (e.g., greedy decoding, beam search, and nucleus sampling). - **Paraphrasing (PP)**: Paraphrase the context in the prompt. - **Sentence Permutation (SP)**: Change the order of named entities in the prompt. - **Entity Frequency Amplification (EFA)**: Repeat sentences containing named entities. - **Stopword Removal (SR)**: Remove stopwords from the context. - **Response Consistency Check (SRC)**: Randomly split the model's output into two parts and check the semantic consistency between them. 2. **Feature Construction**: - **Semantic Sets**: Create semantic equivalence sets based on the semantic similarity of the outputs. - **Lexical Similarity**: Calculate the lexical similarity between outputs. - **SRC Minimum**: Use the contradiction probability of a natural language inference (NLI) model to measure semantic inconsistency between response parts. 3. **Label Creation and Confidence Estimation**: - Create labels by matching the LLM's output with the ground truth responses in the dataset. - Train and predict confidence scores using a logistic regression model. ### Experimental Results - The method significantly outperformed baseline methods on AUROC and AUARC metrics across multiple benchmark tasks. - The performance improvement was particularly notable on the TriviaQA and SQuAD datasets. - Confidence models built for one LLM can zero-shot generalize to other LLMs, showing good cross-model generalization ability. ### Conclusion This paper proposes an effective method for estimating the confidence of large language models through black-box access. The method performs well across multiple benchmark tasks and exhibits good cross-model generalization ability. This research provides new insights for improving the trustworthiness and reliability of large language models.