VLSBench: Unveiling Visual Leakage in Multimodal Safety

Xuhao Hu,Dongrui Liu,Hao Li,Xuanjing Huang,Jing Shao

2024-11-30

Abstract:Safety concerns of Multimodal large language models (MLLMs) have gradually become an important problem in various applications. Surprisingly, previous works indicate a counter-intuitive phenomenon that using textual unlearning to align MLLMs achieves comparable safety performances with MLLMs trained with image-text pairs. To explain such a counter-intuitive phenomenon, we discover a visual safety information leakage (VSIL) problem in existing multimodal safety benchmarks, i.e., the potentially risky and sensitive content in the image has been revealed in the textual query. In this way, MLLMs can easily refuse these sensitive text-image queries according to textual queries. However, image-text pairs without VSIL are common in real-world scenarios and are overlooked by existing multimodal safety benchmarks. To this end, we construct multimodal visual leakless safety benchmark (VLSBench) preventing visual safety leakage from image to textual query with 2.4k image-text pairs. Experimental results indicate that VLSBench poses a significant challenge to both open-source and close-source MLLMs, including LLaVA, Qwen2-VL, Llama3.2-Vision, and GPT-4o. This study demonstrates that textual alignment is enough for multimodal safety scenarios with VSIL, while multimodal alignment is a more promising solution for multimodal safety scenarios without VSIL. Please see our code and data at: <a class="link-external link-http" href="http://hxhcreate.github.io/VLSBench" rel="external noopener nofollow">this http URL</a>

Cryptography and Security,Artificial Intelligence,Computation and Language,Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is a phenomenon called "Visual Safety Information Leakage" (VSIL) in the existing multimodal safety benchmark tests. Specifically, VSIL refers to the leakage of sensitive or risky content in an image in a text query, which enables multimodal large language models (MLLMs) to reject these sensitive image - text requests by relying only on the text query without the need to understand and perceive the image content. This phenomenon has led to the fact that using text - alignment methods (such as text fine - tuning) can achieve safety performance comparable to that of methods using image - text pairs for alignment (such as supervised fine - tuning SFT and reinforcement learning based on human feedback RLHF), even though the latter requires more data collection and computational costs. In order to more accurately evaluate the safety performance of MLLMs in the absence of VSIL, the authors constructed a new multimodal Visual Leakage - free Safety Benchmark (VLSBench), which contains 2,400 image - text pairs and prevents visual safety information leakage from the image to the text query. The experimental results show that VLSBench poses a significant challenge to the existing open - source and closed - source MLLMs. In particular, in the absence of VSIL, the multimodal alignment method performs better than the text - only alignment method. This indicates that in practical applications, multimodal alignment is a more promising solution, especially when dealing with multimodal safety issues without VSIL.

VLSBench: Unveiling Visual Leakage in Multimodal Safety

MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models

Safety Alignment for Vision Language Models

SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models

Multimodal Situational Safety

MM-SpuBench: Towards Better Understanding of Spurious Biases in Multimodal LLMs

Unraveling and Mitigating Safety Alignment Degradation of Vision-Language Models

How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs

Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation

Safety of Multimodal Large Language Models on Images and Texts

Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models

Cross-Modal Safety Alignment: Is textual unlearning all you need?

MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models

SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model

Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models

Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs

MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs

ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time

Insight Over Sight? Exploring the Vision-Knowledge Conflicts in Multimodal LLMs

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions

Query-Relevant Images Jailbreak Large Multi-Modal Models