The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?

Qinyu Zhao,Ming Xu,Kartik Gupta,Akshay Asthana,Liang Zheng,Stephen Gould

2024-07-17

Abstract:Large vision-language models (LVLMs), designed to interpret and respond to human instructions, occasionally generate hallucinated or harmful content due to inappropriate instructions. This study uses linear probing to shed light on the hidden knowledge at the output layers of LVLMs. We demonstrate that the logit distributions of the first tokens contain sufficient information to determine whether to respond to the instructions, including recognizing unanswerable visual questions, defending against jailbreaking attacks, and identifying deceptive questions. Such hidden knowledge is gradually lost in logits of subsequent tokens during response generation. Then, we illustrate a simple decoding strategy at the generation of the first token, effectively improving the generated content. In experiments, we find a few interesting insights: First, the CLIP model already contains a strong signal for solving these tasks, which indicates potential bias in the existing datasets. Second, we observe performance improvement by utilizing the first logit distributions on three additional tasks, including indicating uncertainty in math solving, mitigating hallucination, and image classification. Last, with the same training data, simply finetuning LVLMs improves models' performance but is still inferior to linear probing on these tasks.

Computer Vision and Pattern Recognition,Computation and Language

What problem does this paper attempt to address?

This paper attempts to solve the problem of large vision - language models (LVLMs) generating hallucinations or harmful content when responding to human instructions. Specifically, the authors focus on the performance of LVLMs when faced with unanswerable visual questions, jailbreak attacks, and deceptive questions. These problems may cause the model to generate inaccurate or harmful content. To address these challenges, the paper proposes a method to analyze the hidden knowledge in the output layer of LVLMs through linear probing technology, especially the logit distribution of the first generated token. The study found that the logit distribution of the first token contains sufficient information and can be used to determine whether to respond to instructions, identify unanswerable questions, defend against jailbreak attacks, and identify deceptive questions. In addition, this method can also be used for other tasks, such as indicating uncertainty in math problem - solving, reducing hallucinations, and image classification. Through experiments, the authors also found some interesting insights. For example, the CLIP model has already shown a strong signal in these tasks, which may indicate bias in existing datasets. Finally, the authors propose a simple decoding strategy that uses a trained linear classifier to guide the generation of the first token, thereby effectively improving the generated content and enhancing the security and reliability of the model.

The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?

Embedding and Gradient Say Wrong: A White-Box Method for Hallucination Detection

Analyzing and Mitigating Object Hallucination in Large Vision-Language Models

Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models

From Pixels to Tokens: Revisiting Object Hallucinations in Large Vision-Language Models

On Large Language Models' Hallucination with Regard to Known Facts

Detecting Hallucinations in Large Language Model Generation: A Token Probability Approach

Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models

Teaching Large Language Models to Express Knowledge Boundary from Their Own Signals

IBD: Alleviating Hallucinations in Large Vision-Language Models via Image-Biased Decoding

DAMRO: Dive into the Attention Mechanism of LVLM to Reduce Object Hallucination

Seeing is Believing: Mitigating Hallucination in Large Vision-Language Models via CLIP-Guided Decoding

Alleviating Hallucinations of Large Language Models through Induced Hallucinations

Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding

Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training

A Unified Hallucination Mitigation Framework for Large Vision-Language Models

Visual Hallucinations of Multi-modal Large Language Models

Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback

VaLiD: Mitigating the Hallucination of Large Vision Language Models by Visual Layer Fusion Contrastive Decoding