Fine-Grained Self-Endorsement Improves Factuality and Reasoning

Ante Wang,Linfeng Song,Baolin Peng,Ye Tian,Lifeng Jin,Haitao Mi,Jinsong Su,Dong Yu
DOI: https://doi.org/10.48550/arXiv.2402.15631
2024-02-24
Abstract:This work studies improving large language model (LLM) generations at inference time by mitigating fact-conflicting hallucinations. Particularly, we propose a self-endorsement framework that leverages the fine-grained fact-level comparisons across multiple sampled responses. Compared with prior ensemble methods (Wang et al., 2022;Chen et al., 2023)) that perform response-level selection, our approach can better alleviate hallucinations, especially for longform generation tasks. Our approach can broadly benefit smaller and open-source LLMs as it mainly conducts simple content-based comparisons. Experiments on Biographies show that our method can effectively improve the factuality of generations with simple and intuitive prompts across different scales of LLMs. Besides, comprehensive analyses on TriviaQA and GSM8K demonstrate the potential of self-endorsement for broader application.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
This paper attempts to solve the problem of fact - conflicting hallucinations in text generation by large - language models (LLMs). Specifically, the paper proposes a self - endorsement framework to alleviate this problem by conducting fine - grained fact - level comparisons among multiple sampled responses. Compared with previous methods, such as self - consistency and chain - of - verification, this method can more effectively reduce hallucination phenomena in long - text generation tasks and improve the factual accuracy of the generated content. ### Main contributions of the paper 1. **Proposing the self - endorsement framework**: This framework identifies reliable facts by conducting fine - grained fact - level comparisons among multiple responses and generates the final response based on these facts. This method can not only reduce hallucinations but also improve reasoning ability. 2. **Applicable to LLMs of different scales**: This method mainly relies on simple content - based comparisons, so it can be widely applied to smaller and open - source LLMs. 3. **Experimental verification**: The paper conducts experiments on multiple benchmark datasets such as Biographies, TriviaQA, and GSM8K. The results show that the self - endorsement framework can significantly improve the factual accuracy and reasoning quality of the generated content. ### Method overview 1. **Candidate sampling**: Generate multiple candidate responses from the target LLM. 2. **Fact decomposition**: Decompose each candidate response into multiple facts. 3. **Fact verification**: Calculate the endorsement score of each fact by comparing it with other candidate responses. 4. **Final response generation**: Select or regenerate the final response according to the endorsement score. ### Experimental results - **Biographies**: In long - text generation tasks, the self - endorsement framework significantly improves factual accuracy (Fact Acc.), especially when regenerating responses. - **TriviaQA**: In question - answering tasks, the self - endorsement framework also improves factual accuracy, but has limited improvement on answer recall rate (Ans. Rec.). - **GSM8K**: In mathematical reasoning tasks, the self - endorsement framework not only improves the accuracy of the final answer but also improves the quality of intermediate reasoning steps. ### Conclusion The self - endorsement framework proposed in this paper effectively reduces fact - conflicting hallucinations in the content generated by LLMs through fine - grained fact - level comparisons, and improves the factual accuracy and reasoning quality of the generated content. This method is applicable to LLMs of different scales and has broad application prospects.