Unsupervised Adversarial Example Detection of Vision Transformers for Trustworthy Edge Computing

Jiaxing Li,Yu’an Tan,Jie Yang,Zhengdao Li,Heng Ye,Chenxiao Xia,Yuanzhang Li
DOI: https://doi.org/10.1145/3674981
2024-07-02
Abstract:Many edge computing applications based on computer vision have harnessed the power of deep learning. As an emerging deep learning model for vision, Vision Transformer models have recently achieved record-breaking performance in various vision tasks. But many recent studies on the robustness of the Vision Transformer have shown that the Vision Transformer is still vulnerable to adversarial attacks and is easily affected by adversarial attacks, causing the model to misclassify the input. In this work, we ask an intriguing question: “Can Adversarial Perturbations against Vision Transformers be detected with model explanations?” Driven by this question, we observe that benign samples and adversarial examples have different attribution maps after applying the Grad-CAM interpretability method on the Vision Transformer model. We demonstrate that an adversarial example is a Feature Shift of the input data, which leads to an Attention Deviation of the visual model. We propose a framework for capturing the Attention Deviation of vision models to defend against adversarial attacks. Furthermore, experiments show that our model achieves expectative results.
computer science, information systems, theory & methods, software engineering
What problem does this paper attempt to address?