Visually Dehallucinative Instruction Generation: Know What You Don't Know

Sungguk Cha,Jusung Lee,Younghyun Lee,Cheoljong Yang
2024-02-15
Abstract:"When did the emperor Napoleon invented iPhone?" Such hallucination-inducing question is well known challenge in generative language modeling. In this study, we present an innovative concept of visual hallucination, referred to as "I Know (IK)" hallucination, to address scenarios where "I Don't Know" is the desired response. To effectively tackle this issue, we propose the VQAv2-IDK benchmark, the subset of VQAv2 comprising unanswerable image-question pairs as determined by human annotators. Stepping further, we present the visually dehallucinative instruction generation method for IK hallucination and introduce the IDK-Instructions visual instruction database. Our experiments show that current methods struggle with IK hallucination. Yet, our approach effectively reduces these hallucinations, proving its versatility across different frameworks and datasets.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address the issue in Visual Question Answering (VQA) where models generate incorrect or unrealistic information when faced with questions that cannot or should not be answered definitively, known as the "I Know (IK)" hallucination. Specifically, when a question is unanswerable, false, or humans are uncertain about the answer, the model should be able to recognize this uncertainty and respond with "I don't know." However, existing generative language models often tend to generate answers for all questions, even when those questions are actually unanswerable. To tackle this challenge, the authors propose several key contributions: 1. **Introducing the concept of IK hallucination**: Defining a new type of visual hallucination, namely IK hallucination, where "I don't know" is the expected response. 2. **Proposing the VQAv2-IDK benchmark**: A subset filtered from the VQAv2 dataset, containing image-question pairs marked by human annotators as unanswerable or uncertain. 3. **Developing a visual de-hallucination instruction generation method**: Proposing a method to generate visual instructions to reduce the occurrence of IK hallucinations and creating the IDK-Instructions dataset. 4. **Experimental validation**: Demonstrating through experiments the vulnerability of existing models in handling IK hallucinations and the effectiveness and generality of the proposed IDK-Instructions method. In summary, the paper aims to improve the model's ability to recognize uncertain or unanswerable questions, avoid generating unrealistic answers, and thereby enhance the model's reliability and accuracy.