AIGCs Confuse AI Too: Investigating and Explaining Synthetic Image-induced Hallucinations in Large Vision-Language Models

Yifei Gao,Jiaqi Wang,Zhiyu Lin,Jitao Sang
2024-09-03
Abstract:The evolution of Artificial Intelligence Generated Contents (AIGCs) is advancing towards higher quality. The growing interactions with AIGCs present a new challenge to the data-driven AI community: While AI-generated contents have played a crucial role in a wide range of AI models, the potential hidden risks they introduce have not been thoroughly examined. Beyond human-oriented forgery detection, AI-generated content poses potential issues for AI models originally designed to process natural data. In this study, we underscore the exacerbated hallucination phenomena in Large Vision-Language Models (LVLMs) caused by AI-synthetic images. Remarkably, our findings shed light on a consistent AIGC \textbf{hallucination bias}: the object hallucinations induced by synthetic images are characterized by a greater quantity and a more uniform position distribution, even these synthetic images do not manifest unrealistic or additional relevant visual features compared to natural images. Moreover, our investigations on Q-former and Linear projector reveal that synthetic images may present token deviations after visual projection, thereby amplifying the hallucination bias.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the hallucination problem of large - scale visual - language models (LVLMs) when processing synthetic images. Specifically, the authors note that although AI - generated content (AIGCs) plays an important role in many AI models, the potential risks that these synthetic contents may bring to AI models originally designed to process natural data have not been fully studied. In particular, in LVLMs, synthetic images are more likely to cause the model to hallucinate, that is, the model may wrongly depict non - existent objects or completely fictional content. This phenomenon is not only more numerous in quantity, but also more evenly distributed in location, even if these synthetic images themselves do not contain unrealistic or additional relevant visual features. To explore this problem more deeply, the authors first established a hallucination evaluation environment involving synthetic images and revealed the specific hallucination biases exhibited by LVLMs when processing synthetic images through a series of experiments. They further analyzed the causes of this hallucination bias, especially from the perspective of visual - text alignment. The study found that the current design of the visual projection module may lead to token bias in synthetic images, thereby causing hallucination bias. In addition, the authors also proposed some mitigation measures, such as turning off the Q - former projection or deepening the linear projection layer, to reduce the hallucination bias caused by synthetic images. In summary, this paper aims to explore and explain the hallucination problem caused by synthetic images in LVLMs and proposes possible solutions.