Abstract:The evolution of Artificial Intelligence Generated Contents (AIGCs) is advancing towards higher quality. The growing interactions with AIGCs present a new challenge to the data-driven AI community: While AI-generated contents have played a crucial role in a wide range of AI models, the potential hidden risks they introduce have not been thoroughly examined. Beyond human-oriented forgery detection, AI-generated content poses potential issues for AI models originally designed to process natural data. In this study, we underscore the exacerbated hallucination phenomena in Large Vision-Language Models (LVLMs) caused by AI-synthetic images. Remarkably, our findings shed light on a consistent AIGC \textbf{hallucination bias}: the object hallucinations induced by synthetic images are characterized by a greater quantity and a more uniform position distribution, even these synthetic images do not manifest unrealistic or additional relevant visual features compared to natural images. Moreover, our investigations on Q-former and Linear projector reveal that synthetic images may present token deviations after visual projection, thereby amplifying the hallucination bias.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the hallucination problem of large - scale visual - language models (LVLMs) when processing synthetic images. Specifically, the authors note that although AI - generated content (AIGCs) plays an important role in many AI models, the potential risks that these synthetic contents may bring to AI models originally designed to process natural data have not been fully studied. In particular, in LVLMs, synthetic images are more likely to cause the model to hallucinate, that is, the model may wrongly depict non - existent objects or completely fictional content. This phenomenon is not only more numerous in quantity, but also more evenly distributed in location, even if these synthetic images themselves do not contain unrealistic or additional relevant visual features. To explore this problem more deeply, the authors first established a hallucination evaluation environment involving synthetic images and revealed the specific hallucination biases exhibited by LVLMs when processing synthetic images through a series of experiments. They further analyzed the causes of this hallucination bias, especially from the perspective of visual - text alignment. The study found that the current design of the visual projection module may lead to token bias in synthetic images, thereby causing hallucination bias. In addition, the authors also proposed some mitigation measures, such as turning off the Q - former projection or deepening the linear projection layer, to reduce the hallucination bias caused by synthetic images. In summary, this paper aims to explore and explain the hallucination problem caused by synthetic images in LVLMs and proposes possible solutions.

AIGCs Confuse AI Too: Investigating and Explaining Synthetic Image-induced Hallucinations in Large Vision-Language Models

AIGCs Confuse AI Too: Investigating and Explaining Synthetic Image-induced Hallucinations in Large Vision-Language Models

Visual Hallucination: Definition, Quantification, and Prescriptive Remediations

Unravelling the Mysteries of Hallucination in Large Language Models: Strategies for Precision in Artificial Intelligence Language Generation

A Survey on Hallucination in Large Vision-Language Models

Cognitive Mirage: A Review of Hallucinations in Large Language Models

Embedding and Gradient Say Wrong: A White-Box Method for Hallucination Detection

LightHouse: A Survey of AGI Hallucination

IBD: Alleviating Hallucinations in Large Vision-Language Models via Image-Biased Decoding

Alleviating Hallucinations of Large Language Models through Induced Hallucinations

Evaluation and Analysis of Hallucination in Large Vision-Language Models

Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention

Seeing is Believing: Mitigating Hallucination in Large Vision-Language Models via CLIP-Guided Decoding

From Pixels to Tokens: Revisiting Object Hallucinations in Large Vision-Language Models

Holistic Analysis of Hallucination in GPT-4V(ision): Bias and Interference Challenges

Visual Hallucinations of Multi-modal Large Language Models

Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization