SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

Potsawee Manakul,Adian Liusie,Mark J. F. Gales

2023-10-12

Abstract:Generative Large Language Models (LLMs) such as GPT-3 are capable of generating highly fluent responses to a wide variety of user prompts. However, LLMs are known to hallucinate facts and make non-factual statements which can undermine trust in their output. Existing fact-checking approaches either require access to the output probability distribution (which may not be available for systems such as ChatGPT) or external databases that are interfaced via separate, often complex, modules. In this work, we propose "SelfCheckGPT", a simple sampling-based approach that can be used to fact-check the responses of black-box models in a zero-resource fashion, i.e. without an external database. SelfCheckGPT leverages the simple idea that if an LLM has knowledge of a given concept, sampled responses are likely to be similar and contain consistent facts. However, for hallucinated facts, stochastically sampled responses are likely to diverge and contradict one another. We investigate this approach by using GPT-3 to generate passages about individuals from the WikiBio dataset, and manually annotate the factuality of the generated passages. We demonstrate that SelfCheckGPT can: i) detect non-factual and factual sentences; and ii) rank passages in terms of factuality. We compare our approach to several baselines and show that our approach has considerably higher AUC-PR scores in sentence-level hallucination detection and higher correlation scores in passage-level factuality assessment compared to grey-box methods.

Computation and Language

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the hallucination problem that occurs when large - language models (LLMs) generate text. Specifically, LLMs such as GPT - 3 can generate highly fluent and diverse responses, but these models also tend to produce untrue factual statements, which will undermine users' trust in their output. Existing fact - checking methods either require access to the output probability distribution of the model (which may not be feasible for systems like ChatGPT), or rely on external databases connected through complex module interfaces. Therefore, this paper proposes a new method named "SelfCheckGPT", aiming to detect the factuality of black - box model responses in a zero - resource manner (i.e., without the need for an external database). SelfCheckGPT achieves this goal by comparing the consistency of multiple randomly sampled responses, assuming that if the LLM has knowledge of a certain concept, then the sampled responses are likely to be similar and contain consistent facts; while if the facts are generated by hallucination, the randomly sampled responses may diverge and contradict each other. The paper experimentally verifies the effectiveness of SelfCheckGPT in sentence - level hallucination detection and paragraph - level factuality assessment, showing that its performance is better than existing grey - box methods.

SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

Language Models Hallucinate, but May Excel at Fact Verification

A New Benchmark and Reverse Validation Method for Passage-level Hallucination Detection.

A New Benchmark and Reverse Validation Method for Passage-level Hallucination Detection

HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models

Self-Checker: Plug-and-Play Modules for Fact-Checking with Large Language Models

LLMs Can Check Their Own Results to Mitigate Hallucinations in Traffic Understanding Tasks

CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models

A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation

AutoHall: Automated Hallucination Dataset Generation for Large Language Models

FactTest: Factuality Testing in Large Language Models with Finite-Sample and Distribution-Free Guarantees

A Debate-Driven Experiment on LLM Hallucinations and Accuracy

Hallucination Detection and Hallucination Mitigation: An Investigation

Factuality challenges in the era of large language models and opportunities for fact-checking

The Perils & Promises of Fact-checking with Large Language Models

From ChatGPT to FactGPT: A Participatory Design Study to Mitigate the Effects of Large Language Model Hallucinations on Users

The Earth is Flat? Unveiling Factual Errors in Large Language Models

Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback

Factored Verification: Detecting and Reducing Hallucination in Summaries of Academic Papers

Fine-grained Hallucination Detection and Editing for Language Models