DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations

Aryo Pradipta Gema,Chen Jin,Ahmed Abdulaal,Tom Diethe,Philip Teare,Beatrice Alex,Pasquale Minervini,Amrutha Saseendran
2024-10-24
Abstract:Large Language Models (LLMs) often hallucinate, producing unfaithful or factually incorrect outputs by misrepresenting the provided context or incorrectly recalling internal knowledge. Recent studies have identified specific attention heads within the Transformer architecture, known as retrieval heads, responsible for extracting relevant contextual information. We hypothesise that masking these retrieval heads can induce hallucinations and that contrasting the outputs of the base LLM and the masked LLM can reduce hallucinations. To this end, we propose Decoding by Contrasting Retrieval Heads (DeCoRe), a novel training-free decoding strategy that amplifies information found in the context and model parameters. DeCoRe mitigates potentially hallucinated responses by dynamically contrasting the outputs of the base LLM and the masked LLM, using conditional entropy as a guide. Our extensive experiments confirm that DeCoRe significantly improves performance on tasks requiring high contextual faithfulness, such as summarisation (XSum by 18.6%), instruction following (MemoTrap by 10.9%), and open-book question answering (NQ-Open by 2.4% and NQ-Swap by 5.5%).
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the hallucination problem that often occurs when large - language models (LLMs) generate text. Specifically, LLMs sometimes generate untrue or factually incorrect content, which either does not match the provided context or misremembers internal knowledge. The hallucination problem seriously affects the reliability of LLMs, especially in applications in high - risk fields such as clinical decision - making or legal reasoning. ### Overview of the solution To solve this problem, the author proposes a new decoding strategy named **Decoding by Contrasting Retrieval Heads (DeCoRe)**. The main features of this strategy are as follows: 1. **Identifying retrieval heads**: The author finds that certain attention heads (called "retrieval heads") in the Transformer architecture are responsible for extracting relevant information from the given context. By analyzing the behavior of these retrieval heads, the reasons for LLMs to produce hallucinations can be better understood. 2. **Contrastive decoding**: The DeCoRe method reduces hallucinations by contrasting the outputs of the base LLM and the LLM with masked retrieval heads. Specifically, this method uses conditional entropy as a guide to dynamically adjust the contrast strength, thereby amplifying the information in the context and model parameters. 3. **Dynamically adjusting the contrast strength**: In order to more effectively control the contrastive decoding process, DeCoRe introduces a dynamic adjustment mechanism based on conditional entropy. Conditional entropy reflects the model's uncertainty about the next token. When the conditional entropy is high, the contrast strength will be increased, thereby reducing the potential generation of hallucinations. ### Experimental results The author verifies the effectiveness of DeCoRe through a series of experiments. The experimental results show that DeCoRe significantly improves performance in tasks that require high context fidelity, such as: - **Abstract generation** (XSum dataset): The performance is improved by 18.6%. - **Instruction following** (MemoTrap dataset): The performance is improved by 10.9%. - **Open - book question answering** (NQ - Open and NQ - Swap datasets): The performance is improved by 2.4% and 5.5% respectively. In addition, DeCoRe also performs well in factual recall tasks. For example, in the TriviaQA and PopQA datasets, DeCoRe significantly improves the accuracy of the model. ### Conclusion By contrasting the outputs of the base LLM and the LLM with masked retrieval heads, DeCoRe effectively reduces the generation of hallucinations and improves the performance of LLMs in multiple tasks. This method not only performs excellently in tasks that require context fidelity, but also shows significant advantages in factual recall tasks.