Mitigating Large Language Model Hallucination with Faithful Finetuning

Minda Hu,Bowei He,Yufei Wang,Liangyou Li,Chen Ma,Irwin King
2024-06-17
Abstract:Large language models (LLMs) have demonstrated remarkable performance on various natural language processing tasks. However, they are prone to generating fluent yet untruthful responses, known as "hallucinations". Hallucinations can lead to the spread of misinformation and cause harm in critical applications. Mitigating hallucinations is challenging as they arise from factors such as noisy data, model overconfidence, lack of knowledge, and the generation process itself. Recent efforts have attempted to address this issue through representation editing and decoding algorithms, reducing hallucinations without major structural changes or retraining. However, these approaches either implicitly edit LLMs' behavior in latent space or suppress the tendency to output unfaithful results during decoding instead of explicitly modeling on hallucination. In this work, we introduce Faithful Finetuning (F2), a novel method that explicitly models the process of faithful question answering through carefully designed loss functions during fine-tuning. We conduct extensive experiments on popular datasets and demonstrate that F2 achieves significant improvements over vanilla models and baselines.
Computation and Language
What problem does this paper attempt to address?
The paper aims to address the issue of "hallucinations" that occur when large language models (LLMs) generate text. Specifically: 1. **Problem Background**: Despite the impressive performance of large language models in natural language processing tasks, they tend to generate fluent but untrue responses, a phenomenon known as "hallucination." These inaccuracies not only reduce the reliability of the models but can also cause harm in critical applications. 2. **Research Objective**: To tackle this challenge, particularly in reducing hallucinations in question-answering (QA) tasks, the paper proposes a method called "Faithful Finetuning" (F2). This method explicitly models the process of generating faithful responses by designing a clear loss function during the finetuning process. 3. **Specific Approach**: The F2 method first decomposes the traditional QA objective into two sub-goals—internal fact retrieval and fact-based QA—and designs targeted finetuning strategies to enhance the model's ability to utilize factual information. Additionally, by identifying layers and hotspot areas where the model is prone to hallucinations and applying weighted training, the method further improves the model's accuracy and reliability. In summary, the goal of the paper is to significantly reduce the occurrence of hallucinations in text generation by large language models through a new finetuning method, thereby enhancing their reliability and trustworthiness in practical applications.