Abstract:Large language models (LLMs) exhibit hallucinations in long-form question-answering tasks across various domains and wide applications. Current hallucination detection and mitigation datasets are limited in domains and sizes, which struggle to scale due to prohibitive labor costs and insufficient reliability of existing hallucination annotators. To facilitate the scalable oversight of LLM hallucinations, this paper introduces an iterative self-training framework that simultaneously and progressively scales up the hallucination annotation dataset and improves the accuracy of the hallucination annotator. Based on the Expectation Maximization (EM) algorithm, in each iteration, the framework first applies a hallucination annotation pipeline to annotate a scaled dataset and then trains a more accurate hallucination annotator on the dataset. This new hallucination annotator is adopted in the hallucination annotation pipeline used for the next iteration. Extensive experimental results demonstrate that the finally obtained hallucination annotator with only 7B parameters surpasses the performance of GPT-4 and obtains new state-of-the-art hallucination detection results on HaluEval and HalluQA by zero-shot inference. Such an annotator can not only evaluate the hallucination levels of various LLMs on the large-scale dataset but also help to mitigate the hallucination of LLMs generations, with the Natural Language Inference (NLI) metric increasing from 25% to 37% on HaluEval.

What problem does this paper attempt to address?

The paper mainly focuses on the "hallucination" problem that occurs in large-scale language models (LLMs) in long-form question answering tasks, where the models generate unrealistic or meaningless information. The current hallucination detection and mitigation datasets are limited in domain and scale, making it difficult to expand the scale due to high labor costs and the inaccuracy of existing hallucination annotation tools. To address this issue, the paper proposes an iterative self-training framework that can simultaneously expand the scale of the hallucination annotation dataset and improve the accuracy of the annotator. The framework is based on the Expectation-Maximization (EM) algorithm and works through multiple iterations. In the Expectation (E) step, the existing best hallucination annotators are used to annotate the expanded dataset initially. In the Maximization (M) step, these annotations are combined with new data to train a more accurate hallucination annotator. This process involves three stages, gradually increasing the dimensions of the data such as the number of topics and questions, to stabilize the performance of the annotator on different topics. Experimental results show that the final hallucination annotator with only 7 billion parameters surpasses GPT-4 in a zero-shot setting, achieving new state-of-the-art results on HaluEval and HalluQA. Additionally, this annotator can automatically evaluate the hallucination levels of different LLMs on large-scale datasets and help mitigate the hallucinations generated by LLMs, improving the NLI metric from 25% to 37%. In summary, the paper aims to address the hallucination problem in large-scale language models through an iterative self-training approach, improving the accuracy of hallucination detection and the scale of the dataset, and providing a practical tool for mitigating hallucinations in the future for LLMs.

ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models

ANAH: Analytical Annotation of Hallucinations in Large Language Models

HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models

Evaluation and Analysis of Hallucination in Large Vision-Language Models

Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback

Quantifying and Attributing the Hallucination of Large Language Models via Association Analysis

The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models

Towards Mitigating Hallucination in Large Language Models via Self-Reflection

Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models

Cost-Effective Hallucination Detection for LLMs

Retrieve Only When It Needs: Adaptive Retrieval Augmentation for Hallucination Mitigation in Large Language Models

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

AutoHall: Automated Hallucination Dataset Generation for Large Language Models

Hallucination of Multimodal Large Language Models: A Survey

Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning

Fine-grained Hallucination Detection and Editing for Language Models

Hallucination Detection and Hallucination Mitigation: An Investigation