Abstract:Pre-trained Vision-Language Models (VLMs) have shown great ability in various Vision-Language tasks. However, these VLMs exhibit inherent vulnerabilities to transferable adversarial examples, which could potentially undermine their performance and reliability in real-world applications. Cross-modal interactions have been demonstrated to be the key point to boosting adversarial transferability, but the utilization of them is limited in existing multimodal adversarial attacks. Stable Diffusion, which contains multiple cross-attention modules, possesses great potential in facilitating adversarial transferability by leveraging abundant cross-modal interactions. Therefore, We propose a Multimodal Diffusion-based Attack (MDA), which conducts adversarial attacks against VLMs using Stable Diffusion. Specifically, MDA initially generates adversarial text, which is subsequently utilized to optimize the adversarial image during the diffusion process. Besides leveraging adversarial text in calculating downstream loss, MDA also takes it as the guiding prompt in adversarial image generation during the denoising process, which enriches the ways of cross-modal interactions, thus strengthening the adversarial transferability. Compared with pixel-based attacks, MDA introduces perturbations in the latent space rather than pixel space to manipulate high-level semantics, which is also beneficial to improving adversarial transferability. Experimental results demonstrate that the adversarial examples generated by MDA are highly transferable across different VLMs on different downstream tasks, surpassing state-of-the-art methods by a large margin.

An Optimized Transfer Attack Framework Towards Multi-Modal Machine Learning

Based on Max-Min Framework Transferable Adversarial Attacks

MMAD-Purify: A Precision-Optimized Framework for Efficient and Scalable Multi-Modal Attacks

Transfer Attacks Revisited: A Large-Scale Empirical Study in Real Computer Vision Settings

A Survey on Transferability of Adversarial Examples across Deep Neural Networks

Toward Understanding and Boosting Adversarial Transferability from a Distribution Perspective

Evading Defenses to Transferable Adversarial Examples by Translation-Invariant Attacks

Delving into Transferable Adversarial Examples and Black-box Attacks

Generating Universal Language Adversarial Examples by Understanding and Enhancing the Transferability Across Neural Models

Highly Transferable Diffusion-based Unrestricted Adversarial Attack on Pre-trained Vision-Language Models

Enhancing the Adversarial Transferability with Channel Decomposition

On the Transferability of Adversarial Attacksagainst Neural Text Classifier

Mutual-modality Adversarial Attack with Semantic Perturbation

Enhancing Transferability of Adversarial Examples Through Mixed-Frequency Inputs

Improving Adversarial Transferability with Neighbourhood Gradient Information

Cross-Modality Attack Boosted by Gradient-Evolutionary Multiform Optimization

An ADMM-Based Universal Framework for Adversarial Attacks on Deep Neural Networks

Bag of Tricks to Boost Adversarial Transferability

Improving Adversarial Transferability by Stable Diffusion

Enhancing Adversarial Transferability with Adversarial Weight Tuning

Adaptive Cross-Modal Transferable Adversarial Attacks From Images to Videos