A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation

Neeraj Varshney,Wenlin Yao,Hongming Zhang,Jianshu Chen,Dong Yu

2023-08-12

Abstract:Recently developed large language models have achieved remarkable success in generating fluent and coherent text. However, these models often tend to 'hallucinate' which critically hampers their reliability. In this work, we address this crucial problem and propose an approach that actively detects and mitigates hallucinations during the generation process. Specifically, we first identify the candidates of potential hallucination leveraging the model's logit output values, check their correctness through a validation procedure, mitigate the detected hallucinations, and then continue with the generation process. Through extensive experiments with GPT-3.5 (text-davinci-003) on the 'article generation task', we first demonstrate the individual efficacy of our detection and mitigation techniques. Specifically, the detection technique achieves a recall of ~88% and the mitigation technique successfully mitigates 57.6% of the correctly detected hallucinations. Importantly, our mitigation technique does not introduce new hallucinations even in the case of incorrectly detected hallucinations, i.e., false positives. Then, we show that the proposed active detection and mitigation approach successfully reduces the hallucinations of the GPT-3.5 model from 47.5% to 14.5% on average. We further demonstrate the effectiveness and wide applicability of our approach through additional studies including performance on different types of questions (multi-hop and false premise questions) and with another LLM from a different model family (Vicuna). In summary, our work contributes to improving the reliability and trustworthiness of large language models, a crucial step en route to enabling their widespread adoption in real-world applications.

Computation and Language

What problem does this paper attempt to address?

The paper aims to address the issue of "hallucinations" that occur when large language models (such as GPT-3.5) generate text. Hallucinations refer to generated text that, while grammatically correct and fluent, contains factual errors, illogical content, or information that does not match the input source. These issues severely impact the reliability of language models and limit their widespread adoption in practical applications. The paper proposes an active method to detect and mitigate hallucinations, with the following specific steps: 1. **Candidate Hallucination Identification**: First, identify key concepts in the generated sentences as potential candidates for hallucinations. 2. **Uncertainty Calculation**: Use the model's logit output values to calculate the uncertainty of these concepts. 3. **Verification Process**: Check the correctness of uncertain concepts by creating verification queries and retrieving relevant knowledge. 4. **Hallucination Repair**: If a hallucination is detected, use the retrieved knowledge as evidence to repair the hallucinated sentence. Experimental results show that this method can significantly reduce the proportion of hallucinations in text generated by the GPT-3.5 model, from 47.5% to 14.5%. Additionally, the method demonstrates applicability and effectiveness for different types of problems (multi-hop questions and false premise questions) and across different model families (such as Vicuna). Overall, this work helps improve the reliability and credibility of large language models, paving the way for further promotion of their practical applications.

A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation

Hallucination Detection and Hallucination Mitigation: An Investigation

A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models

Towards Mitigating Hallucination in Large Language Models via Self-Reflection

Detecting and Mitigating the Ungrounded Hallucinations in Text Generation by LLMs

Detecting Hallucinations in Large Language Model Generation: A Token Probability Approach

Cost-Effective Hallucination Detection for LLMs

Alleviating Hallucinations of Large Language Models through Induced Hallucinations

Detecting hallucinations in large language models using semantic entropy

A Debate-Driven Experiment on LLM Hallucinations and Accuracy

Insights into Classifying and Mitigating LLMs' Hallucinations

Developing a Reliable, General-Purpose Hallucination Detection and Mitigation Service: Insights and Lessons Learned

Unravelling the Mysteries of Hallucination in Large Language Models: Strategies for Precision in Artificial Intelligence Language Generation

Mechanistic Understanding and Mitigation of Language Model Non-Factual Hallucinations

Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework

Beyond Fine-Tuning: Effective Strategies for Mitigating Hallucinations in Large Language Models for Data Analytics

The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models

Exploring and Evaluating Hallucinations in LLM-Powered Code Generation

Fine-grained Hallucination Detection and Editing for Language Models

Mitigating Large Language Model Hallucination with Faithful Finetuning