Targeted Black-Box Adversarial Attack Method for Image Classification Models.

Su Zheng,Jialin Chen,Lingli Wang
DOI: https://doi.org/10.1109/ijcnn.2019.8852078
2019-01-01
Abstract:Deep neural networks (DNNs) are widely applied to image classification tasks. Due to the fact that these models are usually vulnerable, subtle perturbations of pixels may lead to classification errors, which poses a serious threat to the success of DNN applications. Moreover, perturbations of pixels can also corrupt other pattern recognition models such as Naive Bayes (NB), Decision Tree (DT) and Random Forest (RF). In this paper, a general method is proposed to carry out targeted black-box attacks for image classification models. The proposed method can achieve targeted fool rates (TFRs) of 0.873 and 0.781 on CIFAR-10 dataset with and without the access to the training set of the target model respectively. For cross-model attacks, the proposed method can still achieve a TFR of 0.630 on CIFAR-10. Furthermore, the proposed method is able to mount attacks for up to 100 classes on CIFAR-100 dataset with a TFR of 0.721, successfully handling 99 cases for each class. In our experiments, the proposed method shows higher performance and higher reliability than other black-box attack methods, with 0.123 greater maximum TFR and 0.602 greater minimum TFR than previous methods UPSET and ANGRI on CIFAR-10 in attacks trained on a single model.
What problem does this paper attempt to address?