Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation

Neeraj Varshney,Satyam Raj,Venkatesh Mishra,Agneet Chatterjee,Ritika Sarkar,Amir Saeidi,Chitta Baral
2024-06-08
Abstract:Large Language Models (LLMs) have achieved remarkable performance across a wide variety of natural language tasks. However, they have been shown to suffer from a critical limitation pertinent to 'hallucination' in their output. Recent research has focused on investigating and addressing this problem for a variety of tasks such as biography generation, question answering, abstractive summarization, and dialogue generation. However, the crucial aspect pertaining to 'negation' has remained considerably underexplored. Negation is important because it adds depth and nuance to the understanding of language and is also crucial for logical reasoning and inference. In this work, we address the above limitation and particularly focus on studying the impact of negation in LLM hallucinations. Specifically, we study four tasks with negation: 'false premise completion', 'constrained fact generation', 'multiple choice question answering', and 'fact generation'. We show that open-source state-of-the-art LLMs such as LLaMA-2-chat, Vicuna, and Orca-2 hallucinate considerably on all these tasks involving negation which underlines a critical shortcoming of these models. Addressing this problem, we further study numerous strategies to mitigate these hallucinations and demonstrate their impact.
Computation and Language
What problem does this paper attempt to address?
This paper attempts to solve the "hallucination" problem of large language models (LLMs) in language tasks involving negation. Specifically, the research focuses on the following aspects: 1. **Hallucination problem**: - Although large language models perform well in many natural language processing tasks, they are prone to generate texts that seem grammatically correct but are actually wrong or do not match the input source. This phenomenon is called "hallucination". 2. **The influence of negation**: - Negation is an important element in language. It increases the depth and nuance of language understanding and is crucial for logical reasoning and inference. However, in the current research on LLMs hallucination, the influence of negation has not been fully explored. 3. **Research tasks**: - The author selected four tasks involving negation for research: False Premise Completion (FPC), Constrained Fact Generation (CFG), Multiple - Choice Question Answering (MCQA), and Fact Generation (FG). These tasks aim to evaluate the performance of LLMs in handling negation. 4. **Model performance**: - The research found that existing open - source LLMs such as LLaMA - 2 - chat, Vicuna - v1.5, and Orca - 2 have significant hallucination problems in all these tasks involving negation. For example, in the FPC task, the hallucination rates of these models are 63.77%, 72.33%, 36.6%, and 62.59% respectively. 5. **Mitigation strategies**: - To reduce the hallucination problem, the author studied a variety of mitigation strategies, including providing Cautionary Instruction, Demonstrative Exemplars, Self - Refinement, and Knowledge - Augmented Generation. The experimental results show that, except for Knowledge - Augmented Generation, other strategies can reduce hallucination to a certain extent. ### Summary The main purpose of this paper is to explore and solve the hallucination problem of LLMs in language tasks involving negation. Through detailed research on four specific tasks, the author reveals the limitations of existing models in handling negation and proposes some effective mitigation strategies. This research provides an important reference for the future development of more robust LLMs. If you have more specific questions or need further information, please feel free to let me know!