Visually Dehallucinative Instruction Generation: Know What You Don't Know

Sungguk Cha,Jusung Lee,Younghyun Lee,Cheoljong Yang

2024-02-15

Abstract:"When did the emperor Napoleon invented iPhone?" Such hallucination-inducing question is well known challenge in generative language modeling. In this study, we present an innovative concept of visual hallucination, referred to as "I Know (IK)" hallucination, to address scenarios where "I Don't Know" is the desired response. To effectively tackle this issue, we propose the VQAv2-IDK benchmark, the subset of VQAv2 comprising unanswerable image-question pairs as determined by human annotators. Stepping further, we present the visually dehallucinative instruction generation method for IK hallucination and introduce the IDK-Instructions visual instruction database. Our experiments show that current methods struggle with IK hallucination. Yet, our approach effectively reduces these hallucinations, proving its versatility across different frameworks and datasets.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper attempts to address the issue in Visual Question Answering (VQA) where models generate incorrect or unrealistic information when faced with questions that cannot or should not be answered definitively, known as the "I Know (IK)" hallucination. Specifically, when a question is unanswerable, false, or humans are uncertain about the answer, the model should be able to recognize this uncertainty and respond with "I don't know." However, existing generative language models often tend to generate answers for all questions, even when those questions are actually unanswerable. To tackle this challenge, the authors propose several key contributions: 1. **Introducing the concept of IK hallucination**: Defining a new type of visual hallucination, namely IK hallucination, where "I don't know" is the expected response. 2. **Proposing the VQAv2-IDK benchmark**: A subset filtered from the VQAv2 dataset, containing image-question pairs marked by human annotators as unanswerable or uncertain. 3. **Developing a visual de-hallucination instruction generation method**: Proposing a method to generate visual instructions to reduce the occurrence of IK hallucinations and creating the IDK-Instructions dataset. 4. **Experimental validation**: Demonstrating through experiments the vulnerability of existing models in handling IK hallucinations and the effectiveness and generality of the proposed IDK-Instructions method. In summary, the paper aims to improve the model's ability to recognize uncertain or unanswerable questions, avoid generating unrealistic answers, and thereby enhance the model's reliability and accuracy.

Visually Dehallucinative Instruction Generation: Know What You Don't Know

Visually Dehallucinative Instruction Generation

Visual Hallucination: Definition, Quantification, and Prescriptive Remediations

Evaluating Image Hallucination in Text-to-Image Generation with Question-Answering

Genetic Approach to Mitigate Hallucination in Generative IR

Hallucination Benchmark in Medical Visual Question Answering

Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding

AIGCs Confuse AI Too: Investigating and Explaining Synthetic Image-induced Hallucinations in Large Vision-Language Models

Visual Hallucinations of Multi-modal Large Language Models

Information Maximizing Visual Question Generation

HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning

Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference

Evaluating Hallucination in Text-to-Image Diffusion Models with Scene-Graph based Question-Answering Agent

HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data

Learning to Generate Visual Questions with Noisy Supervision

IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models

HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models

Estimating the Hallucination Rate of Generative AI

Visual Question Generation for Class Acquisition of Unknown Objects

Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training