Defending Against Universal Patch Attacks by Restricting Token Attention in Vision Transformers

Hongwei Yu,Jiansheng Chen,Huimin Ma,Cheng Yu,Xinlong Ding
DOI: https://doi.org/10.1109/icassp49357.2023.10096862
2023-01-01
ICASSP
Abstract:Previous works reveal that similar to CNNs, vision transformers (ViT) are also vulnerable to universal adversarial patch attacks. In this paper, we empirically reveal and mathematically explain that the shallow tokens in the transformer and the attention of the network can largely influence the classification result. Adversarial patches usually produce large feature norm for the corresponding shallow token vectors which can attract the attention anomalously. Inspired by this, we propose a restriction operation on the attention matrix, which effectively reduces the influence of the patch region. Experiments on ImageNet validate that our proposal can effectively improve ViT’s robustness towards white-box universal patch attacks while maintaining satisfactory classification accuracy for clean samples.
What problem does this paper attempt to address?