Abstract:The emergence of Vision-Language Models (VLMs) represents a significant advancement in integrating computer vision with Large Language Models (LLMs) to generate detailed text descriptions from visual inputs. Despite their growing importance, the security of VLMs, particularly against backdoor attacks, is under explored. Moreover, prior works often assume attackers have access to the original training data, which is often unrealistic. In this paper, we address a more practical and challenging scenario where attackers must rely solely on Out-Of-Distribution (OOD) data. We introduce VLOOD (Backdooring Vision-Language Models with Out-of-Distribution Data), a novel approach with two key contributions: (1) demonstrating backdoor attacks on VLMs in complex image-to-text tasks while minimizing degradation of the original semantics under poisoned inputs, and (2) proposing innovative techniques for backdoor injection without requiring any access to the original training data. Our evaluation on image captioning and visual question answering (VQA) tasks confirms the effectiveness of VLOOD, revealing a critical security vulnerability in VLMs and laying the foundation for future research on securing multimodal models against sophisticated threats.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the security problem of Vision - Language Models (VLMs) when facing backdoor attacks, especially when the attacker can only use Out - Of - Distribution (OOD) external data that has a different distribution from the original training data. Specifically, the paper explores the following points: 1. **Limitations of existing research**: Most existing backdoor attack studies assume that the attacker can access the original training data, which is often unrealistic in practical scenarios. In addition, there are fewer backdoor attack studies for the complex image - to - text generation tasks of VLMs. 2. **Introduction of new methods**: To meet the above challenges, the paper proposes a new backdoor attack method - VLOOD (Backdooring Vision - Language Models with Out - of - Distribution Data). This method can inject backdoors in complex image - to - text generation tasks while minimizing semantic degradation and does not require access to the original training data. 3. **Key contributions**: - **First exploration**: This is the first attempt to perform backdoor attacks on VLMs using OOD data in practical scenarios. - **Innovative technologies**: Proposed Clean Knowledge Preservation (CKP) and Conceptual Consistency Preservation (CCP) technologies, as well as a dynamic weight adjustment mechanism, to ensure that the model can still maintain high semantic consistency when processing poisoned inputs. - **Evaluation and verification**: Through experiments on image captioning and Visual Question Answering (VQA) tasks, the effectiveness of VLOOD is proved, and the key security vulnerabilities in VLMs are revealed. ### Formula summary - **CKP loss function**: \[ L_{\text{CKP}}=\text{KL}(F(I, T)\parallel\tilde{F}(I, T)) = \frac{1}{N}\sum_{(I, T, O)\in D}F(I, T)\log\frac{F(I, T)}{\tilde{F}(I, T)} \] where \((I, T, O)\in D\) are clean samples and \(N\) is the number of clean samples. - **CCP loss function**: \[ S = \frac{1}{n}\sum_{i = 1}^{n}\|a_i - x_i\|_1 \] \[ L_{\text{CCP}}=\frac{1}{N}\sum_{(\tilde{I},\tilde{T},\tilde{O})\in\tilde{D}}\left(\frac{1}{1+\exp(-S)}\right) \] - **Dynamic weight adjustment**: \[ \lambda=\lambda+(\text{Impact}_{\text{clean}}-\text{Impact}_{\text{poisoned}}) \] - **Overall loss function**: \[ L=(1 - \lambda)\cdot(L_{\text{LM}}(\text{clean})+L_{\text{CKP}})+\lambda\cdot(L_{\text{LM}}(\text{poisoned})+L_{\text{CCP}}) \] Through these technologies and methods, VLOOD successfully injects backdoors while ensuring the normal behavior of the model, showing the challenges faced by VLMs in terms of security.

Backdooring Vision-Language Models with Out-Of-Distribution Data

TrojVLM: Backdoor Attack Against Vision Language Models

Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models

Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks

Data Stealing Attacks against Large Language Models via Backdooring

On Evaluating Adversarial Robustness of Large Vision-Language Models

A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends

Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors

Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models

Defending LVLMs Against Vision Attacks through Partial-Perception Supervision

Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift

Visual Adversarial Examples Jailbreak Aligned Large Language Models

Visual Adversarial Attack on Vision-Language Models for Autonomous Driving

Safety Alignment for Vision Language Models

Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language Models

Mitigating Backdoor Threats to Large Language Models: Advancement and Challenges

AnyAttack: Targeted Adversarial Attacks on Vision-Language Models toward Any Images

White-box Multimodal Jailbreaks Against Large Vision-Language Models

A Comprehensive Overview of Backdoor Attacks in Large Language Models within Communication Networks

Towards Adversarially Robust Vision-Language Models: Insights from Design Choices and Prompt Formatting Techniques

VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models