Zero-Resource Hallucination Prevention for Large Language Models

Junyu Luo,Cao Xiao,Fenglong Ma
2023-10-08
Abstract:The prevalent use of large language models (LLMs) in various domains has drawn attention to the issue of "hallucination," which refers to instances where LLMs generate factually inaccurate or ungrounded information. Existing techniques for hallucination detection in language assistants rely on intricate fuzzy, specific free-language-based chain of thought (CoT) techniques or parameter-based methods that suffer from interpretability issues. Additionally, the methods that identify hallucinations post-generation could not prevent their occurrence and suffer from inconsistent performance due to the influence of the instruction format and model style. In this paper, we introduce a novel pre-detection self-evaluation technique, referred to as SELF-FAMILIARITY, which focuses on evaluating the model's familiarity with the concepts present in the input instruction and withholding the generation of response in case of unfamiliar concepts. This approach emulates the human ability to refrain from responding to unfamiliar topics, thus reducing hallucinations. We validate SELF-FAMILIARITY across four different large language models, demonstrating consistently superior performance compared to existing techniques. Our findings propose a significant shift towards preemptive strategies for hallucination mitigation in LLM assistants, promising improvements in reliability, applicability, and interpretability.
Computation and Language
What problem does this paper attempt to address?
The paper aims to address the issue of "hallucinations" produced by large language models (LLMs) in various applications, where the models generate inaccurate or unfounded information. Existing hallucination detection techniques rely on complex fuzzy logic, specific free language chain-of-thought (CoT) techniques, and parameter-based methods, which have interpretability issues. Moreover, current methods can only identify hallucinated information post-generation and cannot prevent its occurrence. Their performance is also unstable due to the influence of instruction format and model style. Therefore, the paper proposes a new pre-detection self-assessment technique called SELF-FAMILIARITY, which reduces hallucinations by evaluating the model's familiarity with the concepts in the input instructions and preventing response generation when encountering unfamiliar concepts. This approach simulates the human ability to avoid discussing unfamiliar topics, thereby reducing the occurrence of hallucinations. The research results show that SELF-FAMILIARITY outperforms existing techniques on 4 different large language models, demonstrating its potential in improving reliability, applicability, and interpretability.