PhD: A Prompted Visual Hallucination Evaluation Dataset

Jiazhen Liu,Yuhan Fu,Ruobing Xie,Runquan Xie,Xingwu Sun,Fengzong Lian,Zhanhui Kang,Xirong Li
DOI: https://doi.org/10.48550/arxiv.2403.11116
2024-01-01
Abstract:The rapid growth of Large Language Models (LLMs) has driven the developmentof Large Vision-Language Models (LVLMs). The challenge of hallucination,prevalent in LLMs, also emerges in LVLMs. However, most existing efforts mainlyfocus on object hallucination in LVLM, ignoring diverse types of LVLMhallucinations. In this study, we delve into the Intrinsic Vision-LanguageHallucination (IVL-Hallu) issue, thoroughly analyzing different types ofIVL-Hallu on their causes and reflections. Specifically, we propose severalnovel IVL-Hallu tasks and categorize them into four types: (a) objecthallucination, which arises from the misidentification of objects, (b)attribute hallucination, which is caused by the misidentification ofattributes, (c) multi-modal conflicting hallucination, which derives from thecontradictions between textual and visual information, and (d)counter-common-sense hallucination, which owes to the contradictions betweenthe LVLM knowledge and actual images. Based on these taxonomies, we propose amore challenging benchmark named PhD to evaluate and explore IVL-Hallu. Anautomated pipeline is proposed for generating different types of IVL-Halludata. Extensive experiments on five SOTA LVLMs reveal their inability toeffectively tackle our proposed IVL-Hallu tasks, with detailed analyses andinsights on the origins and possible solutions of these new challengingIVL-Hallu tasks, facilitating future researches on IVL-Hallu and LVLM. Thebenchmark can be accessed at\href{https://github.com/jiazhen-code/IntrinsicHallu}{this https URL}.
What problem does this paper attempt to address?