Abstract:Recently, Large Vision-Language Models (LVLMs) have demonstrated impressive capabilities in multi-modal context comprehension. However, they still suffer from hallucination problems referring to generating inconsistent outputs with the image content. To mitigate hallucinations, previous studies mainly focus on retraining LVLMs with custom datasets. Although effective, they inherently come with additional computational costs. In this paper, we propose a training-free framework, \textbf{MVP}, that aims to reduce hallucinations by making the most of the innate capabilities of the LVLMs via \textbf{M}ulti-\textbf{V}iew Multi-\textbf{P}ath Reasoning. Specifically, we first devise a multi-view information-seeking strategy to thoroughly perceive the comprehensive information in the image, which enriches the general global information captured by the original vision encoder in LVLMs. Furthermore, during the answer decoding, we observe that the occurrence of hallucinations has a strong correlation with the certainty of the answer tokens. Thus, we propose multi-path reasoning for each information view to quantify and aggregate the certainty scores for each potential answer among multiple decoding paths and finally decide the output answer. By fully grasping the information in the image and carefully considering the certainty of the potential answers when decoding, our MVP can effectively reduce hallucinations in LVLMs.The extensive experiments verify that our proposed MVP significantly mitigates the hallucination problem across four well-known LVLMs. The source code is available at: \url{<a class="link-external link-https" href="https://github.com/GasolSun36/MVP" rel="external noopener nofollow">this https URL</a>}.

ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models

Delve into Visual Contrastive Decoding for Hallucination Mitigation of Large Vision-Language Models

Hallucination Augmented Contrastive Learning for Multimodal Large Language Model

CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models

Mitigating Hallucination Issues in Small-Parameter LLMs Through Inter-Layer Contrastive Decoding

Mitigating Hallucination in Visual-Language Models via Re-Balancing Contrastive Decoding

Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization

VaLiD: Mitigating the Hallucination of Large Vision Language Models by Visual Layer Fusion Contrastive Decoding

MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation

Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding

Pensieve: Retrospect-then-Compare Mitigates Visual Hallucination

Seeing is Believing: Mitigating Hallucination in Large Vision-Language Models via CLIP-Guided Decoding

Mitigating Hallucinations in Large Vision-Language Models (LVLMs) via Language-Contrastive Decoding (LCD)

Mitigating Hallucinations in Large Vision-Language Models via Summary-Guided Decoding

DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations

IBD: Alleviating Hallucinations in Large Vision-Language Models via Image-Biased Decoding

Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence

Alleviating Hallucinations of Large Language Models through Induced Hallucinations

Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning

Hallucination Improves the Performance of Unsupervised Visual Representation Learning