Abstract:Large Language Models (LLMs) are known to hallucinate, whereby they generate plausible but inaccurate text. This phenomenon poses significant risks in critical applications, such as medicine or law, necessitating robust hallucination mitigation strategies. While recent works have proposed fine-tuning methods to teach LLMs to abstain from answering questions beyond their knowledge or capabilities, these methods rely on the existence of ground-truth labels or are limited to short-form responses. To address these limitations, we propose fine-tuning using semantic entropy, an uncertainty measure derived from introspection into the model which does not require external labels. We demonstrate that our approach matches or outperforms models fine-tuned using prior work and achieves strong performance for both short and long-form generations on a range of datasets.

What problem does this paper attempt to address?

This paper attempts to address the hallucination problem that occurs when large - language models (LLMs) handle tasks beyond their knowledge or reasoning capabilities. Specifically, LLMs sometimes generate seemingly reasonable but actually incorrect information, which can lead to serious consequences in critical applications such as healthcare or law. Therefore, effective strategies are required to mitigate this hallucination phenomenon and ensure the safety, credibility, and reliability of LLMs. The paper proposes a new fine - tuning method that uses semantic entropy as an uncertainty measure, enabling the model to choose not to answer questions when it is uncertain. This method does not require external labels, is applicable to both short - form and long - form generation tasks, and performs well on multiple datasets, significantly reducing the hallucination rate. ### Main Contributions 1. **Performance Improvement**: The model fine - tuned with semantic entropy outperforms or at least matches existing fine - tuning methods in long - form (Long - QA) and short - form (Short - QA) answering settings. 2. **New Evaluation Metric**: The accuracy - engagement distance (AED) is introduced, which is a new metric for comprehensively evaluating the degree of model hallucination, taking into account both the accuracy and engagement of the model. 3. **Wide Applicability**: This method is applicable not only to short - form answers but also to long - form answers, and does not rely on labeled data, having better scalability. ### Method Overview - **Dataset Construction**: For each question, generate a standard answer (low - temperature setting) and multiple variant answers (high - temperature setting). By calculating the semantic entropy of these variant answers, determine which questions are of high uncertainty and which are of low uncertainty. - **Fine - Tuning Process**: Divide the questions into two groups, high - uncertainty and low - uncertainty, according to the semantic entropy. Modify the labels of high - uncertainty questions to "I don't know the answer", and keep the standard answers for low - uncertainty questions. Train the model using supervised fine - tuning and the cross - entropy loss function. ### Experimental Results - **Best Threshold Evaluation**: Experiments on multiple datasets show that the model fine - tuned with semantic entropy performs better in most cases, especially in long - form answering tasks. - **All Threshold Evaluation**: Draw an adaptation graph to show the model performance under different thresholds. The model fine - tuned with semantic entropy forms a frontier line in long - form answering tasks, indicating that it can achieve a lower AED under different uncertainty thresholds. ### Conclusion The method proposed in the paper has made significant progress in reducing LLMs hallucination, especially in long - form answering tasks. Semantic entropy, as a more explicit uncertainty measure, helps the model learn and generalize, providing a new direction for future fine - tuning research.

Fine-Tuning Large Language Models to Appropriately Abstain with Semantic Entropy

Detecting hallucinations in large language models using semantic entropy

Enhancing Trust in Large Language Models with Uncertainty-Aware Fine-Tuning

Mitigating Large Language Model Hallucination with Faithful Finetuning

Mitigating LLM Hallucinations via Conformal Abstention

Uncertainty-Based Abstention in LLMs Improves Safety and Reduces Hallucinations

Beyond Fine-Tuning: Effective Strategies for Mitigating Hallucinations in Large Language Models for Data Analytics

Know Your Limits: A Survey of Abstention in Large Language Models

Towards Mitigating Hallucination in Large Language Models via Self-Reflection

Can Large Language Models Faithfully Express Their Intrinsic Uncertainty in Words?

Improving the Reliability of Large Language Models by Leveraging Uncertainty-Aware In-Context Learning

Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs

A Debate-Driven Experiment on LLM Hallucinations and Accuracy

Honest AI: Fine-Tuning "Small" Language Models to Say "I Don't Know", and Reducing Hallucination in RAG

Finetuning Language Models to Emit Linguistic Expressions of Uncertainty

A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation

A Survey on Uncertainty Quantification of Large Language Models: Taxonomy, Open Research Challenges, and Future Directions

Chaos with Keywords: Exposing Large Language Models Sycophantic Hallucination to Misleading Keywords and Evaluating Defense Strategies

Semantically Diverse Language Generation for Uncertainty Estimation in Language Models

A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models

LLMs can learn self-restraint through iterative self-reflection