Abstract:Large vision-language models (LVLMs) have demonstrated their incredible capability in image understanding and response generation. However, this rich visual interaction also makes LVLMs vulnerable to adversarial examples. In this paper, we formulate a novel and practical targeted attack scenario that the adversary can only know the vision encoder of the victim LVLM, without the knowledge of its prompts (which are often proprietary for service providers and not publicly available) and its underlying large language model (LLM). This practical setting poses challenges to the cross-prompt and cross-model transferability of targeted adversarial attack, which aims to confuse the LVLM to output a response that is semantically similar to the attacker's chosen target text. To this end, we propose an instruction-tuned targeted attack (dubbed \textsc{InstructTA}) to deliver the targeted adversarial attack on LVLMs with high transferability. Initially, we utilize a public text-to-image generative model to "reverse" the target response into a target image, and employ GPT-4 to infer a reasonable instruction $\boldsymbol{p}^\prime$ from the target response. We then form a local surrogate model (sharing the same vision encoder with the victim LVLM) to extract instruction-aware features of an adversarial image example and the target image, and minimize the distance between these two features to optimize the adversarial example. To further improve the transferability with instruction tuning, we augment the instruction $\boldsymbol{p}^\prime$ with instructions paraphrased from GPT-4. Extensive experiments demonstrate the superiority of our proposed method in targeted attack performance and transferability. The code is available at <a class="link-external link-https" href="https://github.com/xunguangwang/InstructTA" rel="external noopener nofollow">this https URL</a>.

Downstream Task-agnostic Transferable Attacks on Language-Image Pre-training Models.

Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models

SA-Attack: Improving Adversarial Transferability of Vision-Language Pre-training Models via Self-Augmentation

Improving Adversarial Transferability of Vision-Language Pre-training Models through Collaborative Multimodal Interaction

AdvCLIP: Downstream-agnostic Adversarial Examples in Multimodal Contrastive Learning

Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks

Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers

Exploring Transferability of Multimodal Adversarial Samples for Vision-Language Pre-training Models with Contrastive Learning

Highly Transferable Diffusion-based Unrestricted Adversarial Attack on Pre-trained Vision-Language Models

An Image Is Worth 1000 Lies: Adversarial Transferability across Prompts on Vision-Language Models

Mutual-modality Adversarial Attack with Semantic Perturbation

VQAttack: Transferable Adversarial Attacks on Visual Question Answering via Pre-trained Models

Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks

Towards Adversarial Attack on Vision-Language Pre-training Models

Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory

OT-Attack: Enhancing Adversarial Transferability of Vision-Language Models via Optimal Transport Optimization

VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models

Patch is Enough: Naturalistic Adversarial Patch against Vision-Language Pre-training Models

Efficient Generation of Targeted and Transferable Adversarial Examples for Vision-Language Models Via Diffusion Models

Efficient and Effective Universal Adversarial Attack against Vision-Language Pre-training Models

InstructTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models