Abstract:Large Language Models (LLMs) are known to hallucinate, whereby they generate plausible but inaccurate text. This phenomenon poses significant risks in critical applications, such as medicine or law, necessitating robust hallucination mitigation strategies. While recent works have proposed fine-tuning methods to teach LLMs to abstain from answering questions beyond their knowledge or capabilities, these methods rely on the existence of ground-truth labels or are limited to short-form responses. To address these limitations, we propose fine-tuning using semantic entropy, an uncertainty measure derived from introspection into the model which does not require external labels. We demonstrate that our approach matches or outperforms models fine-tuned using prior work and achieves strong performance for both short and long-form generations on a range of datasets.
What problem does this paper attempt to address?
This paper attempts to address the hallucination problem that occurs when large - language models (LLMs) handle tasks beyond their knowledge or reasoning capabilities. Specifically, LLMs sometimes generate seemingly reasonable but actually incorrect information, which can lead to serious consequences in critical applications such as healthcare or law. Therefore, effective strategies are required to mitigate this hallucination phenomenon and ensure the safety, credibility, and reliability of LLMs.
The paper proposes a new fine - tuning method that uses semantic entropy as an uncertainty measure, enabling the model to choose not to answer questions when it is uncertain. This method does not require external labels, is applicable to both short - form and long - form generation tasks, and performs well on multiple datasets, significantly reducing the hallucination rate.
### Main Contributions
1. **Performance Improvement**: The model fine - tuned with semantic entropy outperforms or at least matches existing fine - tuning methods in long - form (Long - QA) and short - form (Short - QA) answering settings.
2. **New Evaluation Metric**: The accuracy - engagement distance (AED) is introduced, which is a new metric for comprehensively evaluating the degree of model hallucination, taking into account both the accuracy and engagement of the model.
3. **Wide Applicability**: This method is applicable not only to short - form answers but also to long - form answers, and does not rely on labeled data, having better scalability.
### Method Overview
- **Dataset Construction**: For each question, generate a standard answer (low - temperature setting) and multiple variant answers (high - temperature setting). By calculating the semantic entropy of these variant answers, determine which questions are of high uncertainty and which are of low uncertainty.
- **Fine - Tuning Process**: Divide the questions into two groups, high - uncertainty and low - uncertainty, according to the semantic entropy. Modify the labels of high - uncertainty questions to "I don't know the answer", and keep the standard answers for low - uncertainty questions. Train the model using supervised fine - tuning and the cross - entropy loss function.
### Experimental Results
- **Best Threshold Evaluation**: Experiments on multiple datasets show that the model fine - tuned with semantic entropy performs better in most cases, especially in long - form answering tasks.
- **All Threshold Evaluation**: Draw an adaptation graph to show the model performance under different thresholds. The model fine - tuned with semantic entropy forms a frontier line in long - form answering tasks, indicating that it can achieve a lower AED under different uncertainty thresholds.
### Conclusion
The method proposed in the paper has made significant progress in reducing LLMs hallucination, especially in long - form answering tasks. Semantic entropy, as a more explicit uncertainty measure, helps the model learn and generalize, providing a new direction for future fine - tuning research.