Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors

Jiachen Sun,Changsheng Wang,Jiongxiao Wang,Yiwei Zhang,Chaowei Xiao
2024-08-24
Abstract:Large language models have become increasingly prominent, also signaling a shift towards multimodality as the next frontier in artificial intelligence, where their embeddings are harnessed as prompts to generate textual content. Vision-language models (VLMs) stand at the forefront of this advancement, offering innovative ways to combine visual and textual data for enhanced understanding and interaction. However, this integration also enlarges the attack surface. Patch-based adversarial attack is considered the most realistic threat model in physical vision applications, as demonstrated in many existing literature. In this paper, we propose to address patched visual prompt injection, where adversaries exploit adversarial patches to generate target content in VLMs. Our investigation reveals that patched adversarial prompts exhibit sensitivity to pixel-wise randomization, a trait that remains robust even against adaptive attacks designed to counteract such defenses. Leveraging this insight, we introduce SmoothVLM, a defense mechanism rooted in smoothing techniques, specifically tailored to protect VLMs from the threat of patched visual prompt injectors. Our framework significantly lowers the attack success rate to a range between 0% and 5.0% on two leading VLMs, while achieving around 67.3% to 95.0% context recovery of the benign images, demonstrating a balance between security and usability.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the security issue of Vision - Language Models (VLMs) when facing patch - based adversarial attacks. Specifically, an attacker can insert specific adversarial patches in an image to make the VLM generate the target content expected by the attacker. This attack method not only threatens the security and reliability of the VLM but also brings potential risks in practical applications. ### Main problems 1. **Adversarial patch injection**: Attackers use adversarial patches to manipulate the output of the VLM, making it generate specific harmful content. 2. **Insufficiency of existing defense mechanisms**: Existing defense mechanisms have limited effectiveness in the face of complex adversarial attacks, especially in the physical world where attacks are more realistic and effective. ### Solutions To solve the above problems, the paper proposes a new framework named SmoothVLM, aiming to enhance the robustness of the VLM against adversarial patch attacks through randomization smoothing techniques. The specific contributions are as follows: 1. **Proposing a new defense mechanism**: SmoothVLM reduces the effectiveness of adversarial patches by introducing random perturbations (such as random masks, pixel swapping, etc.). Experimental results show that SmoothVLM can significantly reduce the attack success rate to between 0% and 5%. 2. **Improving context restoration ability**: While protecting the model's security, SmoothVLM can also maintain a high benign image context restoration rate, ensuring the model's usability. 3. **Theoretical analysis and experimental verification**: The paper verifies the effectiveness of SmoothVLM through strict mathematical derivations and extensive experiments, proving that it performs well under different types of adversarial attacks. ### Key formulas - Optimization objective of the adversarial patch: \[ \arg \min_{x_{\text{adv}}} d(H_{\text{adv}}, H_{\text{target}}) \] where \(H\) represents the visual embedding and \(d\) is the distance metric in the embedding space. - Attack success rate after random perturbation: \[ \text{ASR} = \Pr[(\text{VPI} \circ \text{VLM})([I \oplus P'; \emptyset]) = 1] \] where \(P'\) is the adversarial patch after random perturbation. - Defense success probability (DSP) of SmoothVLM: \[ \text{DSP}([I \oplus P; \emptyset]) = \Pr[(\text{VPI} \circ \text{SmoothVLM})([I \oplus P; \emptyset]) = 0] \] ### Summary By proposing the SmoothVLM framework, this paper effectively solves the security problem of the VLM when facing patch - based adversarial attacks. SmoothVLM not only significantly reduces the attack success rate but also maintains the model's high usability and context restoration ability, providing new ideas and methods for future security research on multimodal language models.