Abstract:Large language models (LLMs) exhibit hallucinations in long-form question-answering tasks across various domains and wide applications. Current hallucination detection and mitigation datasets are limited in domains and sizes, which struggle to scale due to prohibitive labor costs and insufficient reliability of existing hallucination annotators. To facilitate the scalable oversight of LLM hallucinations, this paper introduces an iterative self-training framework that simultaneously and progressively scales up the hallucination annotation dataset and improves the accuracy of the hallucination annotator. Based on the Expectation Maximization (EM) algorithm, in each iteration, the framework first applies a hallucination annotation pipeline to annotate a scaled dataset and then trains a more accurate hallucination annotator on the dataset. This new hallucination annotator is adopted in the hallucination annotation pipeline used for the next iteration. Extensive experimental results demonstrate that the finally obtained hallucination annotator with only 7B parameters surpasses the performance of GPT-4 and obtains new state-of-the-art hallucination detection results on HaluEval and HalluQA by zero-shot inference. Such an annotator can not only evaluate the hallucination levels of various LLMs on the large-scale dataset but also help to mitigate the hallucination of LLMs generations, with the Natural Language Inference (NLI) metric increasing from 25% to 37% on HaluEval.

Alleviating Hallucinations in Large Language Models with Scepticism Modeling

Towards Mitigating Hallucination in Large Language Models via Self-Reflection

The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models

Mitigating Entity-Level Hallucination in Large Language Models

Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation

Mitigating Large Language Model Hallucination with Faithful Finetuning

Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning

Alleviating Hallucinations of Large Language Models through Induced Hallucinations

ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models

Quantifying and Attributing the Hallucination of Large Language Models via Association Analysis

Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback

Enhancing Trust in Large Language Models with Uncertainty-Aware Fine-Tuning

Detecting Hallucinations in Large Language Model Generation: A Token Probability Approach

Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus

Mechanistic Understanding and Mitigation of Language Model Non-Factual Hallucinations

LLM Internal States Reveal Hallucination Risk Faced With a Query

A Unified Hallucination Mitigation Framework for Large Vision-Language Models

Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation

Zero-Resource Hallucination Prevention for Large Language Models

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions