Robustness analysis of Visual Transformer based on adversarial attacks

Yuqi Fan,Fuzhi He,Zibang Nie
DOI: https://doi.org/10.54254/2755-2721/41/20230737
2024-02-22
Abstract:As model architectures deepen, large models (e.g., Visual Transformer) perform increasingly well on vision tasks. Adversarial attack is an important test to measure the robustness of the model. By adding noise information to the data, it interferes with the discriminative ability of the model. Previous studies have found that adversarial attacks significantly impact small models (e.g., VGG16, ResNet18), while further tests are needed for interference on large models. This paper conducts three experiments to examine the performance of Visual Transformer (ViT) models against adversarial attacks. In Experiment 1, this paper uses three different attack methods (FGSM, I-FGSM, and MI-FGSM) to test the performance of ViT and some small models. In Experiment 2, this paper tests whether the ViT could distinguish noisy data successfully attacked on the small models. In Experiment 3, this paper examines the defense performance of the ViT, and VGG16 retrained on noisy data. The results show that (1) compared to small models, ViT does have a more vital ability to resist noisy data; (2) the performance improvement of ViT could be better than that of the small model after retraining.
What problem does this paper attempt to address?