Abstract:Vision-Language Models (VLMs) have witnessed a surge in both research and real-world applications. However, as they are becoming increasingly prevalent, ensuring their robustness against adversarial attacks is paramount. This work systematically investigates the impact of model design choices on the adversarial robustness of VLMs against image-based attacks. Additionally, we introduce novel, cost-effective approaches to enhance robustness through prompt formatting. By rephrasing questions and suggesting potential adversarial perturbations, we demonstrate substantial improvements in model robustness against strong image-based attacks such as Auto-PGD. Our findings provide important guidelines for developing more robust VLMs, particularly for deployment in safety-critical environments.

What problem does this paper attempt to address?

This paper attempts to address the robustness issue of Vision - Language Models (VLMs) when facing image adversarial attacks. Specifically, the author focuses on the following two main aspects: 1. **The impact of model design choices on adversarial robustness**: - The author systematically studies the impact of different design choices (such as the choice of visual encoder, input resolution, the scale of the language model, and the combination of multiple visual encoders) on the robustness of VLMs under white - box adversarial attacks. - Formula representation: Assume that the attacker has full access to the model parameters and constructs adversarial samples through gradient information. Mathematically, the goal is to find a perturbation \(\delta\) such that the model outputs an error, that is, to maximize the loss function \(L(f(x + \delta), y)\), where \(f\) is the model, \(x\) is the original input, \(\delta\) is an adversarial perturbation that satisfies the constraint \(\|\delta\|_\infty\leq\epsilon\), and \(y\) is the original label. 2. **Enhancement of adversarial robustness by prompt formatting techniques**: - The author introduces a novel and cost - effective method to enhance the robustness of the model by reformulating the problem and prompting potential adversarial perturbations. - Specific experiments include using different prompt formats (such as the original prompt, adversarial - deterministic prompt, adversarial - likelihood prompt, and random prompt) and evaluating the impact of these prompts on image captioning and visual question - answering tasks. ### Main contributions 1. **In - depth analysis of the impact of various design choices of VLMs on their adversarial robustness**. 2. **Propose a new prompt formatting method to enhance the adversarial robustness of VLMs**. 3. **Provide practical guidance and suggestions on how to use text prompt techniques to enhance the robustness of VLMs**. ### Conclusion The author's research shows that although increasing the image resolution or the scale of the language model does not significantly improve the adversarial robustness of VLMs, simple prompt formatting modifications (such as hinting at possible adversarial perturbations) can significantly improve the robustness of the model. This provides important guidance for the future development of safer and more reliable VLMs.

Towards Adversarially Robust Vision-Language Models: Insights from Design Choices and Prompt Formatting Techniques

Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors

On Evaluating Adversarial Robustness of Large Vision-Language Models

Towards Adversarial Attack on Vision-Language Pre-training Models

A Hybrid Defense Strategy for Boosting Adversarial Robustness in Vision-Language Models

Revisiting the Adversarial Robustness of Vision Language Models: a Multimodal Perspective

Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks

Adversarial Prompt Tuning for Vision-Language Models

MirrorCheck: Efficient Adversarial Defense for Vision-Language Models

On the Robustness of Multimodal Large Language Models

Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language Models

VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models

Effective and Efficient Adversarial Detection for Vision-Language Models via A Single Vector

One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models

Backdooring Vision-Language Models with Out-Of-Distribution Data

Replace-then-Perturb: Targeted Adversarial Attacks With Visual Reasoning for Vision-Language Models

Seeing is Deceiving: Exploitation of Visual Pathways in Multi-Modal Language Models

Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models

Fooling Vision and Language Models Despite Localization and Attention Mechanism

On the Robustness of Large Multimodal Models Against Image Adversarial Attacks

Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics