Abstract:Large language models have become increasingly prominent, also signaling a shift towards multimodality as the next frontier in artificial intelligence, where their embeddings are harnessed as prompts to generate textual content. Vision-language models (VLMs) stand at the forefront of this advancement, offering innovative ways to combine visual and textual data for enhanced understanding and interaction. However, this integration also enlarges the attack surface. Patch-based adversarial attack is considered the most realistic threat model in physical vision applications, as demonstrated in many existing literature. In this paper, we propose to address patched visual prompt injection, where adversaries exploit adversarial patches to generate target content in VLMs. Our investigation reveals that patched adversarial prompts exhibit sensitivity to pixel-wise randomization, a trait that remains robust even against adaptive attacks designed to counteract such defenses. Leveraging this insight, we introduce SmoothVLM, a defense mechanism rooted in smoothing techniques, specifically tailored to protect VLMs from the threat of patched visual prompt injectors. Our framework significantly lowers the attack success rate to a range between 0% and 5.0% on two leading VLMs, while achieving around 67.3% to 95.0% context recovery of the benign images, demonstrating a balance between security and usability.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the security issue of Vision - Language Models (VLMs) when facing patch - based adversarial attacks. Specifically, an attacker can insert specific adversarial patches in an image to make the VLM generate the target content expected by the attacker. This attack method not only threatens the security and reliability of the VLM but also brings potential risks in practical applications. ### Main problems 1. **Adversarial patch injection**: Attackers use adversarial patches to manipulate the output of the VLM, making it generate specific harmful content. 2. **Insufficiency of existing defense mechanisms**: Existing defense mechanisms have limited effectiveness in the face of complex adversarial attacks, especially in the physical world where attacks are more realistic and effective. ### Solutions To solve the above problems, the paper proposes a new framework named SmoothVLM, aiming to enhance the robustness of the VLM against adversarial patch attacks through randomization smoothing techniques. The specific contributions are as follows: 1. **Proposing a new defense mechanism**: SmoothVLM reduces the effectiveness of adversarial patches by introducing random perturbations (such as random masks, pixel swapping, etc.). Experimental results show that SmoothVLM can significantly reduce the attack success rate to between 0% and 5%. 2. **Improving context restoration ability**: While protecting the model's security, SmoothVLM can also maintain a high benign image context restoration rate, ensuring the model's usability. 3. **Theoretical analysis and experimental verification**: The paper verifies the effectiveness of SmoothVLM through strict mathematical derivations and extensive experiments, proving that it performs well under different types of adversarial attacks. ### Key formulas - Optimization objective of the adversarial patch: \[ \arg \min_{x_{\text{adv}}} d(H_{\text{adv}}, H_{\text{target}}) \] where \(H\) represents the visual embedding and \(d\) is the distance metric in the embedding space. - Attack success rate after random perturbation: \[ \text{ASR} = \Pr[(\text{VPI} \circ \text{VLM})([I \oplus P'; \emptyset]) = 1] \] where \(P'\) is the adversarial patch after random perturbation. - Defense success probability (DSP) of SmoothVLM: \[ \text{DSP}([I \oplus P; \emptyset]) = \Pr[(\text{VPI} \circ \text{SmoothVLM})([I \oplus P; \emptyset]) = 0] \] ### Summary By proposing the SmoothVLM framework, this paper effectively solves the security problem of the VLM when facing patch - based adversarial attacks. SmoothVLM not only significantly reduces the attack success rate but also maintains the model's high usability and context restoration ability, providing new ideas and methods for future security research on multimodal language models.

Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors

Patch is Enough: Naturalistic Adversarial Patch against Vision-Language Pre-training Models

Adversarial Prompt Tuning for Vision-Language Models

Towards Adversarially Robust Vision-Language Models: Insights from Design Choices and Prompt Formatting Techniques

On Evaluating Adversarial Robustness of Large Vision-Language Models

A Hybrid Defense Strategy for Boosting Adversarial Robustness in Vision-Language Models

Enhancing Cross-Prompt Transferability in Vision-Language Models through Contextual Injection of Target Tokens

Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models

Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks

TrojVLM: Backdoor Attack Against Vision Language Models

Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language Models

Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models

VLMGuard: Defending VLMs against Malicious Prompts via Unlabeled Data

Seeing is Deceiving: Exploitation of Visual Pathways in Multi-Modal Language Models

One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models

MirrorCheck: Efficient Adversarial Defense for Vision-Language Models

Backdooring Vision-Language Models with Out-Of-Distribution Data

An Image Is Worth 1000 Lies: Adversarial Transferability across Prompts on Vision-Language Models

Revisiting the Adversarial Robustness of Vision Language Models: a Multimodal Perspective

MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance

White-box Multimodal Jailbreaks Against Large Vision-Language Models