Abstract:Recently developed large language models have achieved remarkable success in generating fluent and coherent text. However, these models often tend to 'hallucinate' which critically hampers their reliability. In this work, we address this crucial problem and propose an approach that actively detects and mitigates hallucinations during the generation process. Specifically, we first identify the candidates of potential hallucination leveraging the model's logit output values, check their correctness through a validation procedure, mitigate the detected hallucinations, and then continue with the generation process. Through extensive experiments with GPT-3.5 (text-davinci-003) on the 'article generation task', we first demonstrate the individual efficacy of our detection and mitigation techniques. Specifically, the detection technique achieves a recall of ~88% and the mitigation technique successfully mitigates 57.6% of the correctly detected hallucinations. Importantly, our mitigation technique does not introduce new hallucinations even in the case of incorrectly detected hallucinations, i.e., false positives. Then, we show that the proposed active detection and mitigation approach successfully reduces the hallucinations of the GPT-3.5 model from 47.5% to 14.5% on average. We further demonstrate the effectiveness and wide applicability of our approach through additional studies including performance on different types of questions (multi-hop and false premise questions) and with another LLM from a different model family (Vicuna). In summary, our work contributes to improving the reliability and trustworthiness of large language models, a crucial step en route to enabling their widespread adoption in real-world applications.

A Mathematical Investigation of Hallucination and Creativity in GPT Models

A Survey on Large Language Model Hallucination via a Creativity Perspective

Unravelling the Mysteries of Hallucination in Large Language Models: Strategies for Precision in Artificial Intelligence Language Generation

A Mathematical Interpretation of Autoregressive Generative Pre-Trained Transformer and Self-Supervised Learning

Embedding and Gradient Say Wrong: A White-Box Method for Hallucination Detection

Cognitive Mirage: A Review of Hallucinations in Large Language Models

Hallucination Detection and Hallucination Mitigation: An Investigation

Calibrated Language Models Must Hallucinate

A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation

UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation

No Free Lunch: Fundamental Limits of Learning Non-Hallucinating Generative Models

HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models

Probing the Creativity of Large Language Models: Can models produce divergent semantic association?

Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem

Look Within, Why LLMs Hallucinate: A Causal Perspective

FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning

A Debate-Driven Experiment on LLM Hallucinations and Accuracy

How Language Model Hallucinations Can Snowball

MetaCheckGPT -- A Multi-task Hallucination Detector Using LLM Uncertainty and Meta-models

Sources of Hallucination by Large Language Models on Inference Tasks