Abstract:This survey presents a comprehensive analysis of the phenomenon of hallucination in multimodal large language models (MLLMs), also known as Large Vision-Language Models (LVLMs), which have demonstrated significant advancements and remarkable abilities in multimodal tasks. Despite these promising developments, MLLMs often generate outputs that are inconsistent with the visual content, a challenge known as hallucination, which poses substantial obstacles to their practical deployment and raises concerns regarding their reliability in real-world applications. This problem has attracted increasing attention, prompting efforts to detect and mitigate such inaccuracies. We review recent advances in identifying, evaluating, and mitigating these hallucinations, offering a detailed overview of the underlying causes, evaluation benchmarks, metrics, and strategies developed to address this issue. Additionally, we analyze the current challenges and limitations, formulating open questions that delineate potential pathways for future research. By drawing the granular classification and landscapes of hallucination causes, evaluation benchmarks, and mitigation methods, this survey aims to deepen the understanding of hallucinations in MLLMs and inspire further advancements in the field. Through our thorough and in-depth review, we contribute to the ongoing dialogue on enhancing the robustness and reliability of MLLMs, providing valuable insights and resources for researchers and practitioners alike. Resources are available at:

Object Hallucination Detection in Large Vision Language Models Via Evidential Conflict

Evaluating Object Hallucination in Large Vision-Language Models

Analyzing and Mitigating Object Hallucination in Large Vision-Language Models

Investigating and Mitigating Object Hallucinations in Pretrained Vision-Language (CLIP) Models

From Pixels to Tokens: Revisiting Object Hallucinations in Large Vision-Language Models

A Survey on Hallucination in Large Vision-Language Models

Evaluation and Analysis of Hallucination in Large Vision-Language Models

Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models

VaLiD: Mitigating the Hallucination of Large Vision Language Models by Visual Layer Fusion Contrastive Decoding

Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback

Multi-Object Hallucination in Vision-Language Models

Hallucination of Multimodal Large Language Models: A Survey

Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training

A Survey of Hallucination in Large Visual Language Models

VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models

Seeing is Believing: Mitigating Hallucination in Large Vision-Language Models via CLIP-Guided Decoding

Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models