Abstract:Multi-modal large language models (MLLMs) have been shown to efficiently integrate natural language with visual information to handle multi-modal tasks. However, MLLMs still face a fundamental limitation of hallucinations, where they tend to generate erroneous or fabricated information. In this paper, we address hallucinations in MLLMs from a novel perspective of representation learning. We first analyzed the representation distribution of textual and visual tokens in MLLM, revealing two important findings: 1) there is a significant gap between textual and visual representations, indicating unsatisfactory cross-modal representation alignment; 2) representations of texts that contain and do not contain hallucinations are entangled, making it challenging to distinguish them. These two observations inspire us with a simple yet effective method to mitigate hallucinations. Specifically, we introduce contrastive learning into MLLMs and use text with hallucination as hard negative examples, naturally bringing representations of non-hallucinative text and visual samples closer while pushing way representations of non-hallucinating and hallucinative text. We evaluate our method quantitatively and qualitatively, showing its effectiveness in reducing hallucination occurrences and improving performance across multiple benchmarks. On the MMhal-Bench benchmark, our method obtains a 34.66% /29.5% improvement over the baseline MiniGPT-4/LLaVA. Our code is available on <a class="link-external link-https" href="https://github.com/X-PLUG/mPLUG-HalOwl/tree/main/hacl" rel="external noopener nofollow">this https URL</a>.

SLM Meets LLM: Balancing Latency, Interpretability and Consistency in Hallucination Detection

Mitigating Hallucination Issues in Small-Parameter LLMs Through Inter-Layer Contrastive Decoding

Developing a Reliable, General-Purpose Hallucination Detection and Mitigation Service: Insights and Lessons Learned

Cost-Effective Hallucination Detection for LLMs

Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models

Combating Multimodal LLM Hallucination via Bottom-up Holistic Reasoning

Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation

Prompt-Guided Internal States for Hallucination Detection of Large Language Models

Hallucination Detection and Hallucination Mitigation: An Investigation

Hallucination-aware Optimization for Large Language Model-empowered Communications

Banishing LLM Hallucinations Requires Rethinking Generalization

Do LLMs Know about Hallucination? An Empirical Investigation of LLM's Hidden States

The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models

Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback

Look Within, Why LLMs Hallucinate: A Causal Perspective

LLM Internal States Reveal Hallucination Risk Faced With a Query

Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models

Hallucination Detection in LLMs: Fast and Memory-Efficient Fine-Tuned Models

Hallucination Augmented Contrastive Learning for Multimodal Large Language Model

Detecting Hallucinations in Large Language Model Generation: A Token Probability Approach

Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework