Maintaining Informative Coherence: Migrating Hallucinations in Large Language Models via Absorbing Markov Chains

Jiemin Wu,Songning Lai,Ruiqiang Xiao,Tianlang Xue,Jiayu Yang,Yutao Yue
2024-10-27
Abstract:Large Language Models (LLMs) are powerful tools for text generation, translation, and summarization, but they often suffer from hallucinations-instances where they fail to maintain the fidelity and coherence of contextual information during decoding, sometimes overlooking critical details due to their sampling strategies and inherent biases from training data and fine-tuning discrepancies. These hallucinations can propagate through the web, affecting the trustworthiness of information disseminated online. To address this issue, we propose a novel decoding strategy that leverages absorbing Markov chains to quantify the significance of contextual information and measure the extent of information loss during generation. By considering all possible paths from the first to the last token, our approach enhances the reliability of model outputs without requiring additional training or external data. Evaluations on datasets including TruthfulQA, FACTOR, and HaluEval highlight the superior performance of our method in mitigating hallucinations, underscoring the necessity of ensuring accurate information flow in web-based applications.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
This paper attempts to address the hallucination problem that occurs when large - language models (LLMs) generate text. Specifically, during the decoding process, LLMs may overlook key information in the context, resulting in generated content that is linguistically coherent but lacks factual basis or accuracy. These problems will affect the credibility of online information dissemination, especially bringing significant risks in fields that require precision and authenticity. ### Main Problems and Solutions in the Paper #### 1. **Problem Description** - **Hallucination Phenomenon**: When LLMs generate text, they sometimes generate content that does not conform to the facts. For example, for the simple question "What is the capital of Australia?", the correct answer is "Canberra", but an LLM may generate a seemingly reasonable but wrong answer, such as "Sydney". This phenomenon is called hallucination. - **Cause Analysis**: - LLMs rely on statistical patterns in the training data and lack a clear understanding of the content or context. - The model may overlook key information in the early context when generating the next word. - Sampling strategies and biases in the training data may also cause the model to hallucinate. #### 2. **Solutions** To solve the above problems, the paper proposes a novel decoding strategy based on Absorbing Markov Chains (AMCs). This method enhances the reliability of the model output by quantifying the importance of context information and measuring the degree of information loss during the generation process. The specific steps are as follows: - **Modeling Information Flow**: Consider the process of LLMs generating new content as an information dissemination process from a given context to new content. By considering all possible paths from the first to the last token, more comprehensively quantify the key information and its flow. - **Application of Absorbing Markov Chains**: Utilize the theory of AMCs, regard the last token as an absorbing state, and consider all possible paths from the context prefix to subsequent tokens. This can quantify the information content of context tokens, making the model pay more attention to the ignored tokens during decoding and ensuring that the generated content is accurate and coherent. - **Adjusting Token Probability Distribution**: Adjust the probability distribution of the next token according to the information loss to reduce the occurrence of hallucination. The specific adjustment formula is: \[ \tilde{D}(t) = D(t) + \lambda \sum_{i = 0}^{t - 1}\text{Norm}(L_{\text{info}}(i)\cdot D(i)) \] where \(D(t)\) is the original token probability distribution, \(\lambda\) is a调节 parameter, and \(L_{\text{info}}(i)\) is the information loss of the \(i\)-th token. #### 3. **Experimental Verification** The paper was evaluated on multiple datasets, including TruthfulQA, FACTOR, and HaluEval. The results show that this method performs excellently in reducing hallucination, highlighting the necessity of ensuring accurate information flow in web - based applications. ### Summary By introducing the theory of Absorbing Markov Chains, this paper provides a novel and effective decoding strategy aimed at reducing the hallucination phenomenon when LLMs generate text, thereby improving the reliability and accuracy of the generated content. This method not only does not require additional training or external data but can also be applied to various large - model architectures, having wide applicability and stability.