Abstract:Large Language Models (LLMs) have achieved remarkable performance across a wide variety of natural language tasks. However, they have been shown to suffer from a critical limitation pertinent to 'hallucination' in their output. Recent research has focused on investigating and addressing this problem for a variety of tasks such as biography generation, question answering, abstractive summarization, and dialogue generation. However, the crucial aspect pertaining to 'negation' has remained considerably underexplored. Negation is important because it adds depth and nuance to the understanding of language and is also crucial for logical reasoning and inference. In this work, we address the above limitation and particularly focus on studying the impact of negation in LLM hallucinations. Specifically, we study four tasks with negation: 'false premise completion', 'constrained fact generation', 'multiple choice question answering', and 'fact generation'. We show that open-source state-of-the-art LLMs such as LLaMA-2-chat, Vicuna, and Orca-2 hallucinate considerably on all these tasks involving negation which underlines a critical shortcoming of these models. Addressing this problem, we further study numerous strategies to mitigate these hallucinations and demonstrate their impact.

What problem does this paper attempt to address?

This paper attempts to solve the "hallucination" problem of large language models (LLMs) in language tasks involving negation. Specifically, the research focuses on the following aspects: 1. **Hallucination problem**: - Although large language models perform well in many natural language processing tasks, they are prone to generate texts that seem grammatically correct but are actually wrong or do not match the input source. This phenomenon is called "hallucination". 2. **The influence of negation**: - Negation is an important element in language. It increases the depth and nuance of language understanding and is crucial for logical reasoning and inference. However, in the current research on LLMs hallucination, the influence of negation has not been fully explored. 3. **Research tasks**: - The author selected four tasks involving negation for research: False Premise Completion (FPC), Constrained Fact Generation (CFG), Multiple - Choice Question Answering (MCQA), and Fact Generation (FG). These tasks aim to evaluate the performance of LLMs in handling negation. 4. **Model performance**: - The research found that existing open - source LLMs such as LLaMA - 2 - chat, Vicuna - v1.5, and Orca - 2 have significant hallucination problems in all these tasks involving negation. For example, in the FPC task, the hallucination rates of these models are 63.77%, 72.33%, 36.6%, and 62.59% respectively. 5. **Mitigation strategies**: - To reduce the hallucination problem, the author studied a variety of mitigation strategies, including providing Cautionary Instruction, Demonstrative Exemplars, Self - Refinement, and Knowledge - Augmented Generation. The experimental results show that, except for Knowledge - Augmented Generation, other strategies can reduce hallucination to a certain extent. ### Summary The main purpose of this paper is to explore and solve the hallucination problem of LLMs in language tasks involving negation. Through detailed research on four specific tasks, the author reveals the limitations of existing models in handling negation and proposes some effective mitigation strategies. This research provides an important reference for the future development of more robust LLMs. If you have more specific questions or need further information, please feel free to let me know!

Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation

Strong hallucinations from negation and how to fix them

LLMs Will Always Hallucinate, and We Need to Live With This

Sources of Hallucination by Large Language Models on Inference Tasks

Comprehending and Reducing LLM Hallucinations

Hallucination Detection and Hallucination Mitigation: An Investigation

Hallucination is Inevitable: An Innate Limitation of Large Language Models

A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models

Insights into Classifying and Mitigating LLMs' Hallucinations

Banishing LLM Hallucinations Requires Rethinking Generalization

Do LLMs Know about Hallucination? An Empirical Investigation of LLM's Hidden States

The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

A Debate-Driven Experiment on LLM Hallucinations and Accuracy

The Troubling Emergence of Hallucination in Large Language Models -- An Extensive Definition, Quantification, and Prescriptive Remediations

The Two Sides of the Coin: Hallucination Generation and Detection with LLMs as Evaluators for LLMs

Alleviating Hallucinations of Large Language Models through Induced Hallucinations

Look Within, Why LLMs Hallucinate: A Causal Perspective

Don't Believe Everything You Read: Enhancing Summarization Interpretability through Automatic Identification of Hallucinations in Large Language Models

Unravelling the Mysteries of Hallucination in Large Language Models: Strategies for Precision in Artificial Intelligence Language Generation