A Survey on Hallucination in Large Vision-Language Models

Hanchao Liu,Wenyuan Xue,Yifei Chen,Dapeng Chen,Xiutian Zhao,Ke Wang,Liping Hou,Rongjun Li,Wei Peng

2024-05-06

Abstract:Recent development of Large Vision-Language Models (LVLMs) has attracted growing attention within the AI landscape for its practical implementation potential. However, ``hallucination'', or more specifically, the misalignment between factual visual content and corresponding textual generation, poses a significant challenge of utilizing LVLMs. In this comprehensive survey, we dissect LVLM-related hallucinations in an attempt to establish an overview and facilitate future mitigation. Our scrutiny starts with a clarification of the concept of hallucinations in LVLMs, presenting a variety of hallucination symptoms and highlighting the unique challenges inherent in LVLM hallucinations. Subsequently, we outline the benchmarks and methodologies tailored specifically for evaluating hallucinations unique to LVLMs. Additionally, we delve into an investigation of the root causes of these hallucinations, encompassing insights from the training data and model components. We also critically review existing methods for mitigating hallucinations. The open questions and future directions pertaining to hallucinations within LVLMs are discussed to conclude this survey.

Computer Vision and Pattern Recognition,Computation and Language,Machine Learning

What problem does this paper attempt to address?

This paper focuses on the "illusion" problem in large-scale vision-language models (LVLMs), which refers to the mismatch between the factual content and the generated content when the model processes images and generates text. The researchers conducted a comprehensive investigation into the illusions in LVLMs with the aim of summarizing the problem and promoting the development of future mitigation measures. They first defined the concept of illusions, pointing out that it can manifest as judgement errors or descriptive errors, and showcased different types of illusion symptoms through examples. The paper then discussed the evaluation benchmarks and methods specific to the illusions in LVLMs, as well as the origins of these illusions, including biases in training data, limitations of visual encoders, and modal alignment issues. Additionally, the paper reviewed existing methods for mitigating illusions and proposed future research directions. Overall, the paper aims to facilitate understanding of the illusions in LVLMs and guide the development of more reliable and efficient models.

A Survey on Hallucination in Large Vision-Language Models

A Survey on Hallucination in Large Vision-Language Models

A Survey of Hallucination in Large Visual Language Models

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

Hallucination of Multimodal Large Language Models: A Survey

Evaluation and Analysis of Hallucination in Large Vision-Language Models

Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback

Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

Evaluating Object Hallucination in Large Vision-Language Models

Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models

From Pixels to Tokens: Revisiting Object Hallucinations in Large Vision-Language Models

Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens

Reference-free Hallucination Detection for Large Vision-Language Models

A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models

Analyzing and Mitigating Object Hallucination in Large Vision-Language Models

Cognitive Mirage: A Review of Hallucinations in Large Language Models

A Unified Hallucination Mitigation Framework for Large Vision-Language Models