Chain-of-Verification Reduces Hallucination in Large Language Models

Shehzaad Dhuliawala,Mojtaba Komeili,Jing Xu,Roberta Raileanu,Xian Li,Asli Celikyilmaz,Jason Weston

2023-09-25

Abstract:Generation of plausible yet incorrect factual information, termed hallucination, is an unsolved issue in large language models. We study the ability of language models to deliberate on the responses they give in order to correct their mistakes. We develop the Chain-of-Verification (CoVe) method whereby the model first (i) drafts an initial response; then (ii) plans verification questions to fact-check its draft; (iii) answers those questions independently so the answers are not biased by other responses; and (iv) generates its final verified response. In experiments, we show CoVe decreases hallucinations across a variety of tasks, from list-based questions from Wikidata, closed book MultiSpanQA and longform text generation.

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the misinformation (i.e., hallucination) in the text generated by large - language models (LLMs). Although with the increase in the number of model parameters, large - language models have improved performance on tasks such as closed - ended question answering, they still make mistakes when dealing with facts in long - tail distributions (i.e., facts that are less common in the training corpus), and these mistakes usually seem reasonable but are actually incorrect. The paper proposes a method called Chain - of - Verification (CoVe), aiming to reduce such hallucination phenomena through self - verification. Specifically, the CoVe method includes four core steps: 1. **Generate baseline response**: Given a query, use the LLM to generate an initial response. 2. **Plan verification**: Based on the original query and the baseline response, generate a series of verification questions to check whether the facts in the original response are correct. 3. **Perform verification**: Answer these verification questions one by one to check for consistency with the original response or to discover errors. 4. **Generate final verification response**: According to the discovered inconsistencies (if any), generate a revised response, incorporating the verification results. The paper demonstrates the effectiveness of the CoVe method on multiple tasks (such as list - based questions, closed - ended multi - span question answering, and long - text generation) through experiments, proving that this method can significantly reduce hallucination phenomena while maintaining or increasing the amount of correct content.

Chain-of-Verification Reduces Hallucination in Large Language Models

Alleviating Hallucinations of Large Language Models through Induced Hallucinations

Chain of Natural Language Inference for Reducing Large Language Model Ungrounded Hallucinations

Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework

Ever: Mitigating Hallucination in Large Language Models through Real-Time Verification and Rectification

Towards Mitigating Hallucination in Large Language Models via Self-Reflection

Deductive Verification of Chain-of-Thought Reasoning

Minimizing Factual Inconsistency and Hallucination in Large Language Models

VERITAS: A Unified Approach to Reliability Evaluation

A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation

On Large Language Models' Hallucination with Regard to Known Facts

Language Models Hallucinate, but May Excel at Fact Verification

Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators

Large Language Models are reasoners with Self-Verification

A Debate-Driven Experiment on LLM Hallucinations and Accuracy

Deceptive Semantic Shortcuts on Reasoning Chains: How Far Can Models Go without Hallucination?

Maintaining Informative Coherence: Migrating Hallucinations in Large Language Models via Absorbing Markov Chains

Unravelling the Mysteries of Hallucination in Large Language Models: Strategies for Precision in Artificial Intelligence Language Generation

Fine-grained Hallucination Detection and Editing for Language Models

Factored Verification: Detecting and Reducing Hallucination in Summaries of Academic Papers