ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models

Yuzhe Gu,Ziwei Ji,Wenwei Zhang,Chengqi Lyu,Dahua Lin,Kai Chen
2024-07-06
Abstract:Large language models (LLMs) exhibit hallucinations in long-form question-answering tasks across various domains and wide applications. Current hallucination detection and mitigation datasets are limited in domains and sizes, which struggle to scale due to prohibitive labor costs and insufficient reliability of existing hallucination annotators. To facilitate the scalable oversight of LLM hallucinations, this paper introduces an iterative self-training framework that simultaneously and progressively scales up the hallucination annotation dataset and improves the accuracy of the hallucination annotator. Based on the Expectation Maximization (EM) algorithm, in each iteration, the framework first applies a hallucination annotation pipeline to annotate a scaled dataset and then trains a more accurate hallucination annotator on the dataset. This new hallucination annotator is adopted in the hallucination annotation pipeline used for the next iteration. Extensive experimental results demonstrate that the finally obtained hallucination annotator with only 7B parameters surpasses the performance of GPT-4 and obtains new state-of-the-art hallucination detection results on HaluEval and HalluQA by zero-shot inference. Such an annotator can not only evaluate the hallucination levels of various LLMs on the large-scale dataset but also help to mitigate the hallucination of LLMs generations, with the Natural Language Inference (NLI) metric increasing from 25% to 37% on HaluEval.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper mainly focuses on the "hallucination" problem that occurs in large-scale language models (LLMs) in long-form question answering tasks, where the models generate unrealistic or meaningless information. The current hallucination detection and mitigation datasets are limited in domain and scale, making it difficult to expand the scale due to high labor costs and the inaccuracy of existing hallucination annotation tools. To address this issue, the paper proposes an iterative self-training framework that can simultaneously expand the scale of the hallucination annotation dataset and improve the accuracy of the annotator. The framework is based on the Expectation-Maximization (EM) algorithm and works through multiple iterations. In the Expectation (E) step, the existing best hallucination annotators are used to annotate the expanded dataset initially. In the Maximization (M) step, these annotations are combined with new data to train a more accurate hallucination annotator. This process involves three stages, gradually increasing the dimensions of the data such as the number of topics and questions, to stabilize the performance of the annotator on different topics. Experimental results show that the final hallucination annotator with only 7 billion parameters surpasses GPT-4 in a zero-shot setting, achieving new state-of-the-art results on HaluEval and HalluQA. Additionally, this annotator can automatically evaluate the hallucination levels of different LLMs on large-scale datasets and help mitigate the hallucinations generated by LLMs, improving the NLI metric from 25% to 37%. In summary, the paper aims to address the hallucination problem in large-scale language models through an iterative self-training approach, improving the accuracy of hallucination detection and the scale of the dataset, and providing a practical tool for mitigating hallucinations in the future for LLMs.