Abstract:Vision-Language Pre-training (VLP) models have achieved remarkable success in practice, while easily being misled by adversarial attack. Though harmful, adversarial attacks are valuable in revealing the blind-spots of VLP models and promoting their robustness. However, existing adversarial attacking studies pay insufficient attention to the key roles of different modality-correlated features, leading to unsatisfactory transferable attacking performance. To tackle this issue, we propose the Transferable MultiModal (TMM) attack framework, which tailors both the modality consistency and modality discrepancy features. To promote transferability, we propose the attention-directed feature perturbation to disturb the modality-consistency features in critical attention regions. In light of the commonly employed cross-attention can represent the consistent features among diverse models, it is more possible to mislead the similar model perception for activating stronger transferability. For improving attacking ability, we proposed the orthogonal-guided feature heterogenization to guide the adversarial perturbation to contain more modality-discrepancy features in the encoded embeddings. Since VLP models rely more on aligned features among different modalities during decision-making, increasing the modality-discrepant could confuse the learned representation for better attacking ability. Extensive experiments under diverse settings demonstrate that the proposed TMM outperforms the comparisons by large margins, i.e., 20.47% improvements in transferable attacking ability on average. Moreover, we highlight that our TMM also shows outstanding attacking performance on large models, such as MiniGPT-4, Otter, etc.

Universal Adversarial Perturbations for Vision-Language Pre-trained Models

One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models

Sample-agnostic Adversarial Perturbation for Vision-Language Pre-training Models

Towards Adversarial Attack on Vision-Language Pre-training Models

Patch is Enough: Naturalistic Adversarial Patch against Vision-Language Pre-training Models

Efficient and Effective Universal Adversarial Attack against Vision-Language Pre-training Models

VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models

A Unified Understanding of Adversarial Vulnerability Regarding Unimodal Models and Vision-Language Pre-training Models

Transferable Multimodal Attack on Vision-Language Pre-training Models

Revisiting the Adversarial Robustness of Vision Language Models: a Multimodal Perspective

Probing the Robustness of Vision-Language Pretrained Models: A Multimodal Adversarial Attack Approach

On Evaluating Adversarial Robustness of Large Vision-Language Models

Doubly-Universal Adversarial Perturbations: Deceiving Vision-Language Models Across Both Images and Text with a Single Perturbation

Exploring Transferability of Multimodal Adversarial Samples for Vision-Language Pre-training Models with Contrastive Learning

Adversarial Prompt Tuning for Vision-Language Models

Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models

Partially Recentralization Softmax Loss for Vision-Language Models Robustness

Improving Adversarial Transferability of Vision-Language Pre-training Models through Collaborative Multimodal Interaction

Highly Transferable Diffusion-based Unrestricted Adversarial Attack on Pre-trained Vision-Language Models

Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks

SA-Attack: Improving Adversarial Transferability of Vision-Language Pre-training Models via Self-Augmentation