Towards Adversarially Robust Vision-Language Models: Insights from Design Choices and Prompt Formatting Techniques

Rishika Bhagwatkar,Shravan Nayak,Reza Bayat,Alexis Roger,Daniel Z Kaplan,Pouya Bashivan,Irina Rish
2024-07-16
Abstract:Vision-Language Models (VLMs) have witnessed a surge in both research and real-world applications. However, as they are becoming increasingly prevalent, ensuring their robustness against adversarial attacks is paramount. This work systematically investigates the impact of model design choices on the adversarial robustness of VLMs against image-based attacks. Additionally, we introduce novel, cost-effective approaches to enhance robustness through prompt formatting. By rephrasing questions and suggesting potential adversarial perturbations, we demonstrate substantial improvements in model robustness against strong image-based attacks such as Auto-PGD. Our findings provide important guidelines for developing more robust VLMs, particularly for deployment in safety-critical environments.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
This paper attempts to address the robustness issue of Vision - Language Models (VLMs) when facing image adversarial attacks. Specifically, the author focuses on the following two main aspects: 1. **The impact of model design choices on adversarial robustness**: - The author systematically studies the impact of different design choices (such as the choice of visual encoder, input resolution, the scale of the language model, and the combination of multiple visual encoders) on the robustness of VLMs under white - box adversarial attacks. - Formula representation: Assume that the attacker has full access to the model parameters and constructs adversarial samples through gradient information. Mathematically, the goal is to find a perturbation \(\delta\) such that the model outputs an error, that is, to maximize the loss function \(L(f(x + \delta), y)\), where \(f\) is the model, \(x\) is the original input, \(\delta\) is an adversarial perturbation that satisfies the constraint \(\|\delta\|_\infty\leq\epsilon\), and \(y\) is the original label. 2. **Enhancement of adversarial robustness by prompt formatting techniques**: - The author introduces a novel and cost - effective method to enhance the robustness of the model by reformulating the problem and prompting potential adversarial perturbations. - Specific experiments include using different prompt formats (such as the original prompt, adversarial - deterministic prompt, adversarial - likelihood prompt, and random prompt) and evaluating the impact of these prompts on image captioning and visual question - answering tasks. ### Main contributions 1. **In - depth analysis of the impact of various design choices of VLMs on their adversarial robustness**. 2. **Propose a new prompt formatting method to enhance the adversarial robustness of VLMs**. 3. **Provide practical guidance and suggestions on how to use text prompt techniques to enhance the robustness of VLMs**. ### Conclusion The author's research shows that although increasing the image resolution or the scale of the language model does not significantly improve the adversarial robustness of VLMs, simple prompt formatting modifications (such as hinting at possible adversarial perturbations) can significantly improve the robustness of the model. This provides important guidance for the future development of safer and more reliable VLMs.