A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation

Neeraj Varshney,Wenlin Yao,Hongming Zhang,Jianshu Chen,Dong Yu
2023-08-12
Abstract:Recently developed large language models have achieved remarkable success in generating fluent and coherent text. However, these models often tend to 'hallucinate' which critically hampers their reliability. In this work, we address this crucial problem and propose an approach that actively detects and mitigates hallucinations during the generation process. Specifically, we first identify the candidates of potential hallucination leveraging the model's logit output values, check their correctness through a validation procedure, mitigate the detected hallucinations, and then continue with the generation process. Through extensive experiments with GPT-3.5 (text-davinci-003) on the 'article generation task', we first demonstrate the individual efficacy of our detection and mitigation techniques. Specifically, the detection technique achieves a recall of ~88% and the mitigation technique successfully mitigates 57.6% of the correctly detected hallucinations. Importantly, our mitigation technique does not introduce new hallucinations even in the case of incorrectly detected hallucinations, i.e., false positives. Then, we show that the proposed active detection and mitigation approach successfully reduces the hallucinations of the GPT-3.5 model from 47.5% to 14.5% on average. We further demonstrate the effectiveness and wide applicability of our approach through additional studies including performance on different types of questions (multi-hop and false premise questions) and with another LLM from a different model family (Vicuna). In summary, our work contributes to improving the reliability and trustworthiness of large language models, a crucial step en route to enabling their widespread adoption in real-world applications.
Computation and Language
What problem does this paper attempt to address?
The paper aims to address the issue of "hallucinations" that occur when large language models (such as GPT-3.5) generate text. Hallucinations refer to generated text that, while grammatically correct and fluent, contains factual errors, illogical content, or information that does not match the input source. These issues severely impact the reliability of language models and limit their widespread adoption in practical applications. The paper proposes an active method to detect and mitigate hallucinations, with the following specific steps: 1. **Candidate Hallucination Identification**: First, identify key concepts in the generated sentences as potential candidates for hallucinations. 2. **Uncertainty Calculation**: Use the model's logit output values to calculate the uncertainty of these concepts. 3. **Verification Process**: Check the correctness of uncertain concepts by creating verification queries and retrieving relevant knowledge. 4. **Hallucination Repair**: If a hallucination is detected, use the retrieved knowledge as evidence to repair the hallucinated sentence. Experimental results show that this method can significantly reduce the proportion of hallucinations in text generated by the GPT-3.5 model, from 47.5% to 14.5%. Additionally, the method demonstrates applicability and effectiveness for different types of problems (multi-hop questions and false premise questions) and across different model families (such as Vicuna). Overall, this work helps improve the reliability and credibility of large language models, paving the way for further promotion of their practical applications.