Beware the Black-Box: on the Robustness of Recent Defenses to Adversarial Examples

Kaleel Mahmood,Deniz Gurevin,Marten van Dijk,Phuong Ha Nguyen
DOI: https://doi.org/10.3390/e23101359
2021-05-21
Abstract:Many defenses have recently been proposed at venues like NIPS, ICML, ICLR and CVPR. These defenses are mainly focused on mitigating white-box attacks. They do not properly examine black-box attacks. In this paper, we expand upon the analysis of these defenses to include adaptive black-box adversaries. Our evaluation is done on nine defenses including Barrage of Random Transforms, ComDefend, Ensemble Diversity, Feature Distillation, The Odds are Odd, Error Correcting Codes, Distribution Classifier Defense, K-Winner Take All and Buffer Zones. Our investigation is done using two black-box adversarial models and six widely studied adversarial attacks for CIFAR-10 and Fashion-MNIST datasets. Our analyses show most recent defenses (7 out of 9) provide only marginal improvements in security ($<25\%$), as compared to undefended networks. For every defense, we also show the relationship between the amount of data the adversary has at their disposal, and the effectiveness of adaptive black-box attacks. Overall, our results paint a clear picture: defenses need both thorough white-box and black-box analyses to be considered secure. We provide this large scale study and analyses to motivate the field to move towards the development of more robust black-box defenses.
Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the insufficient effectiveness of many current defense mechanisms against adversarial sample attacks when facing black - box attacks. Specifically, the paper points out: 1. **White - box defense is not equal to black - box defense**: Many existing defense mechanisms mainly focus on mitigating white - box attacks, that is, the situation where an attacker generates adversarial samples with full knowledge of the model parameters and architecture. However, these defense mechanisms have not fully evaluated their performance under black - box attacks, that is, the situation where the attacker has no knowledge of the model parameters and architecture. Since in adversarial machine learning, white - box attacks may fail due to gradient masking, but black - box attacks may still be effective, so it is necessary to consider the security under both white - box and black - box attacks. 2. **Lack of comprehensive evaluation of black - box attacks**: Although most defense papers have studied white - box attacks in detail, less attention has been paid to black - box attacks. This has led to a lack of comprehensive understanding of the actual effects of existing defense mechanisms under black - box attacks. To address these problems, the paper has carried out the following work: - **Comprehensive black - box defense analysis**: The paper selected 9 recently proposed defense mechanisms and evaluated them using 12 different attack methods. Each defense mechanism was trained and tested under the same conditions to ensure the comparability of the results. - **Research on the intensity of adaptive black - box attacks**: For the first time, the paper shows the impact of different amounts of training data on the effectiveness of adaptive black - box attacks, thereby revealing the performance of defense mechanisms under black - box attacks of different intensities. - **Open - source code and detailed implementation**: To help the community develop stronger black - box defense mechanisms, the paper provides experimental code and detailed implementation instructions. Through this work, the paper aims to promote the development of the field of adversarial machine learning, especially in improving the robustness of black - box defense mechanisms.