Transferable Adversarial Attacks on Vision Transformers with Token Gradient Regularization

Jianping Zhang,Yizhan Huang,Weibin Wu,Michael R. Lyu

DOI: https://doi.org/10.48550/arXiv.2303.15754

2023-06-05

Abstract:Vision transformers (ViTs) have been successfully deployed in a variety of computer vision tasks, but they are still vulnerable to adversarial samples. Transfer-based attacks use a local model to generate adversarial samples and directly transfer them to attack a target black-box model. The high efficiency of transfer-based attacks makes it a severe security threat to ViT-based applications. Therefore, it is vital to design effective transfer-based attacks to identify the deficiencies of ViTs beforehand in security-sensitive scenarios. Existing efforts generally focus on regularizing the input gradients to stabilize the updated direction of adversarial samples. However, the variance of the back-propagated gradients in intermediate blocks of ViTs may still be large, which may make the generated adversarial samples focus on some model-specific features and get stuck in poor local optima. To overcome the shortcomings of existing approaches, we propose the Token Gradient Regularization (TGR) method. According to the structural characteristics of ViTs, TGR reduces the variance of the back-propagated gradient in each internal block of ViTs in a token-wise manner and utilizes the regularized gradient to generate adversarial samples. Extensive experiments on attacking both ViTs and CNNs confirm the superiority of our approach. Notably, compared to the state-of-the-art transfer-based attacks, our TGR offers a performance improvement of 8.8% on average.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: the vulnerability of Vision Transformers (ViTs) when facing adversarial sample attacks. Specifically, the paper focuses on how to design effective transfer attack methods to identify the deficiencies of ViTs in security - sensitive scenarios. Existing research usually focuses on regularizing the input gradient to stabilize the update direction of adversarial samples, but this method may not effectively reduce the variance of the back - propagation gradient in the intermediate layers of ViTs, resulting in the generated adversarial samples being prone to fall into local optimal solutions, thereby affecting their cross - model transferability. Therefore, this paper proposes a new method - Token Gradient Regularization (TGR), which aims to generate more transferable adversarial samples by reducing the variance of the back - propagation gradient of each token in the internal blocks of ViTs. The core of the TGR method lies in that it regularizes the token gradients in each internal block according to the structural characteristics of ViTs, thereby reducing the gradient variance and using the regularized gradients to generate adversarial samples. In this way, TGR can avoid over - relying on the features of specific models and improve the transferability of adversarial samples between different models. Experimental results show that, compared with the existing state - of - the - art transfer attack methods, the performance of TGR is improved by an average of 8.8% when attacking ViT models and by an average of 6.2% when attacking CNN models. In addition, TGR can also be combined with other compatible attack algorithms to further enhance the transferability of the generated adversarial samples.

Transferable Adversarial Attacks on Vision Transformers with Token Gradient Regularization

Protego: Detecting Adversarial Examples for Vision Transformers Via Intrinsic Capabilities

Towards transferable adversarial attacks on vision transformers for image classification

Towards Transferable Adversarial Attacks on Image and Video Transformers

TIA: Token Importance Transferable Attack on Vision Transformers.

Transferable Adversarial Attack for Both Vision Transformers and Convolutional Networks Via Momentum Integrated Gradients

Improving transferable adversarial attack for vision transformers via global attention and local drop

On Improving Adversarial Transferability of Vision Transformers

Generating Transferable Adversarial Examples Against Vision Transformers

Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers

Dual stage black-box adversarial attack against vision transformer

Attacking Transformers with Feature Diversity Adversarial Perturbation

Improving the Transferability of Adversarial Examples with Restructure Embedded Patches

Enhancing the Transferability of Adversarial Attacks through Variance Tuning

When Adversarial Training Meets Vision Transformers: Recipes from Training to Architecture

Adversarial Token Attacks on Vision Transformers

Generative Transferable Adversarial Attack

Query-Efficient Hard-Label Black-Box Attack against Vision Transformers

On the Adversarial Robustness of Vision Transformers

Backdoor Attack Against Vision Transformers via Attention Gradient-Based Image Erosion

Towards Efficient Adversarial Training on Vision Transformers