An Interpretive Adversarial Attack Method: Attacking Softmax Gradient Layer-Wise Relevance Propagation Based on Cosine Similarity Constraint and TS-Invariant

Zigang Chen,Renjie Dai,Zhenghao Liu,Long Chen,Yuhong Liu,Kai Sheng
DOI: https://doi.org/10.1007/s11063-022-11056-5
IF: 2.565
2022-01-01
Neural Processing Letters
Abstract:Deep learning has shown remarkable advantages in many fields. Although the image recognition capabilities and deep neural network (DNN) have developed rapidly in recent years, relevant studies have confirmed that DNN will be attacked by well-crafted images, resulting in model recognition errors. The adversarial examples generated by the traditional black-box attack are not smooth enough and have poor transferability. We note that the interpretive method of neural network may help us find the focus of neural networks. Therefore, in this paper, in order to reduce the distortion of the adversarial examples and improve the transferability. We take advantage of softmax gradient layer-wise relevance propagation (SGLRP) in distinguishing which pixels have made important contributions in classification to guide the neural network to focus on the wrong regions by providing relevance scores, and perform experiments on a series of classical deep neural networks. The results demonstrate that by minimizing the loss function composed of the SGLRP objective function based on cosine similarity, we can effectively generate transferable adversarial examples with high peak signal-to-noise ratio (PSNR, 33.0554dB) and structural similarity index measure (SSIM, 0.9617), achieving an average fooling rate of more than 87.53% on several DNNs. Compared with traditional attack methods, the proposed attack schema focus more on the common semantic features of DNNs, shows good transferability and the ability of generating high-quality adversarial examples.
What problem does this paper attempt to address?