Abstract:This work studies improving large language model (LLM) generations at inference time by mitigating fact-conflicting hallucinations. Particularly, we propose a self-endorsement framework that leverages the fine-grained fact-level comparisons across multiple sampled responses. Compared with prior ensemble methods (Wang et al., 2022;Chen et al., 2023)) that perform response-level selection, our approach can better alleviate hallucinations, especially for longform generation tasks. Our approach can broadly benefit smaller and open-source LLMs as it mainly conducts simple content-based comparisons. Experiments on Biographies show that our method can effectively improve the factuality of generations with simple and intuitive prompts across different scales of LLMs. Besides, comprehensive analyses on TriviaQA and GSM8K demonstrate the potential of self-endorsement for broader application.

What problem does this paper attempt to address?

This paper attempts to solve the problem of fact - conflicting hallucinations in text generation by large - language models (LLMs). Specifically, the paper proposes a self - endorsement framework to alleviate this problem by conducting fine - grained fact - level comparisons among multiple sampled responses. Compared with previous methods, such as self - consistency and chain - of - verification, this method can more effectively reduce hallucination phenomena in long - text generation tasks and improve the factual accuracy of the generated content. ### Main contributions of the paper 1. **Proposing the self - endorsement framework**: This framework identifies reliable facts by conducting fine - grained fact - level comparisons among multiple responses and generates the final response based on these facts. This method can not only reduce hallucinations but also improve reasoning ability. 2. **Applicable to LLMs of different scales**: This method mainly relies on simple content - based comparisons, so it can be widely applied to smaller and open - source LLMs. 3. **Experimental verification**: The paper conducts experiments on multiple benchmark datasets such as Biographies, TriviaQA, and GSM8K. The results show that the self - endorsement framework can significantly improve the factual accuracy and reasoning quality of the generated content. ### Method overview 1. **Candidate sampling**: Generate multiple candidate responses from the target LLM. 2. **Fact decomposition**: Decompose each candidate response into multiple facts. 3. **Fact verification**: Calculate the endorsement score of each fact by comparing it with other candidate responses. 4. **Final response generation**: Select or regenerate the final response according to the endorsement score. ### Experimental results - **Biographies**: In long - text generation tasks, the self - endorsement framework significantly improves factual accuracy (Fact Acc.), especially when regenerating responses. - **TriviaQA**: In question - answering tasks, the self - endorsement framework also improves factual accuracy, but has limited improvement on answer recall rate (Ans. Rec.). - **GSM8K**: In mathematical reasoning tasks, the self - endorsement framework not only improves the accuracy of the final answer but also improves the quality of intermediate reasoning steps. ### Conclusion The self - endorsement framework proposed in this paper effectively reduces fact - conflicting hallucinations in the content generated by LLMs through fine - grained fact - level comparisons, and improves the factual accuracy and reasoning quality of the generated content. This method is applicable to LLMs of different scales and has broad application prospects.

Fine-Grained Self-Endorsement Improves Factuality and Reasoning

Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation

Internal Consistency and Self-Feedback in Large Language Models: A Survey

Towards Mitigating Hallucination in Large Language Models via Self-Reflection

Ever: Mitigating Hallucination in Large Language Models through Real-Time Verification and Rectification

Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation

Large Language Models Are Better Reasoners with Self-Verification

Lower Layer Matters: Alleviating Hallucination via Multi-Layer Fusion Contrastive Decoding with Truthfulness Refocused

OntoFact: Unveiling Fantastic Fact-Skeleton of LLMs Via Ontology-Driven Reinforcement Learning

Mitigating Large Language Model Hallucination with Faithful Finetuning

Integrative Decoding: Improve Factuality via Implicit Self-consistency

Long-form factuality in large language models

Large Language Models Can Self-Improve in Long-context Reasoning

Language Models Hallucinate, but May Excel at Fact Verification

Large Language Models are reasoners with Self-Verification

Alleviating Hallucinations of Large Language Models through Induced Hallucinations

Self-Evaluation Improves Selective Generation in Large Language Models

Advancing Large Language Model Attribution through Self-Improving

Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories

Improving Factuality with Explicit Working Memory

Alleviating Hallucinations in Large Language Models with Scepticism Modeling