Chain-of-Verification Reduces Hallucination in Large Language Models

Shehzaad Dhuliawala,Mojtaba Komeili,Jing Xu,Roberta Raileanu,Xian Li,Asli Celikyilmaz,Jason Weston
2023-09-25
Abstract:Generation of plausible yet incorrect factual information, termed hallucination, is an unsolved issue in large language models. We study the ability of language models to deliberate on the responses they give in order to correct their mistakes. We develop the Chain-of-Verification (CoVe) method whereby the model first (i) drafts an initial response; then (ii) plans verification questions to fact-check its draft; (iii) answers those questions independently so the answers are not biased by other responses; and (iv) generates its final verified response. In experiments, we show CoVe decreases hallucinations across a variety of tasks, from list-based questions from Wikidata, closed book MultiSpanQA and longform text generation.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the misinformation (i.e., hallucination) in the text generated by large - language models (LLMs). Although with the increase in the number of model parameters, large - language models have improved performance on tasks such as closed - ended question answering, they still make mistakes when dealing with facts in long - tail distributions (i.e., facts that are less common in the training corpus), and these mistakes usually seem reasonable but are actually incorrect. The paper proposes a method called Chain - of - Verification (CoVe), aiming to reduce such hallucination phenomena through self - verification. Specifically, the CoVe method includes four core steps: 1. **Generate baseline response**: Given a query, use the LLM to generate an initial response. 2. **Plan verification**: Based on the original query and the baseline response, generate a series of verification questions to check whether the facts in the original response are correct. 3. **Perform verification**: Answer these verification questions one by one to check for consistency with the original response or to discover errors. 4. **Generate final verification response**: According to the discovered inconsistencies (if any), generate a revised response, incorporating the verification results. The paper demonstrates the effectiveness of the CoVe method on multiple tasks (such as list - based questions, closed - ended multi - span question answering, and long - text generation) through experiments, proving that this method can significantly reduce hallucination phenomena while maintaining or increasing the amount of correct content.