Abstract:Many defenses have recently been proposed at venues like NIPS, ICML, ICLR and CVPR. These defenses are mainly focused on mitigating white-box attacks. They do not properly examine black-box attacks. In this paper, we expand upon the analysis of these defenses to include adaptive black-box adversaries. Our evaluation is done on nine defenses including Barrage of Random Transforms, ComDefend, Ensemble Diversity, Feature Distillation, The Odds are Odd, Error Correcting Codes, Distribution Classifier Defense, K-Winner Take All and Buffer Zones. Our investigation is done using two black-box adversarial models and six widely studied adversarial attacks for CIFAR-10 and Fashion-MNIST datasets. Our analyses show most recent defenses (7 out of 9) provide only marginal improvements in security ($<25\%$), as compared to undefended networks. For every defense, we also show the relationship between the amount of data the adversary has at their disposal, and the effectiveness of adaptive black-box attacks. Overall, our results paint a clear picture: defenses need both thorough white-box and black-box analyses to be considered secure. We provide this large scale study and analyses to motivate the field to move towards the development of more robust black-box defenses.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the insufficient effectiveness of many current defense mechanisms against adversarial sample attacks when facing black - box attacks. Specifically, the paper points out: 1. **White - box defense is not equal to black - box defense**: Many existing defense mechanisms mainly focus on mitigating white - box attacks, that is, the situation where an attacker generates adversarial samples with full knowledge of the model parameters and architecture. However, these defense mechanisms have not fully evaluated their performance under black - box attacks, that is, the situation where the attacker has no knowledge of the model parameters and architecture. Since in adversarial machine learning, white - box attacks may fail due to gradient masking, but black - box attacks may still be effective, so it is necessary to consider the security under both white - box and black - box attacks. 2. **Lack of comprehensive evaluation of black - box attacks**: Although most defense papers have studied white - box attacks in detail, less attention has been paid to black - box attacks. This has led to a lack of comprehensive understanding of the actual effects of existing defense mechanisms under black - box attacks. To address these problems, the paper has carried out the following work: - **Comprehensive black - box defense analysis**: The paper selected 9 recently proposed defense mechanisms and evaluated them using 12 different attack methods. Each defense mechanism was trained and tested under the same conditions to ensure the comparability of the results. - **Research on the intensity of adaptive black - box attacks**: For the first time, the paper shows the impact of different amounts of training data on the effectiveness of adaptive black - box attacks, thereby revealing the performance of defense mechanisms under black - box attacks of different intensities. - **Open - source code and detailed implementation**: To help the community develop stronger black - box defense mechanisms, the paper provides experimental code and detailed implementation instructions. Through this work, the paper aims to promote the development of the field of adversarial machine learning, especially in improving the robustness of black - box defense mechanisms.

Beware the Black-Box: on the Robustness of Recent Defenses to Adversarial Examples

Attack As Defense: Characterizing Adversarial Examples Using Robustness.

RobustBlack: Challenging Black-Box Adversarial Attacks on State-of-the-Art Defenses

From Attack to Defense: Insights into Deep Learning Security Measures in Black-Box Settings

Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable Confidence

Data-free Defense of Black Box Models Against Adversarial Attacks

A Universal Defense Strategy Against Adversarial Attacks Based on Attention-Guided

Black-box Adversarial Attacks with Limited Queries and Information

On Adaptive Attacks to Adversarial Example Defenses

On evaluating adversarial robustness

BlackboxBench: A Comprehensive Benchmark of Black-box Adversarial Attacks

Intriguing Properties of Adversarial Examples

Are You Confident That You Have Successfully Generated Adversarial Examples?

Stateful Defenses for Machine Learning Models Are Not Yet Secure Against Black-box Attacks

Robust width: A lightweight and certifiable adversarial defense

On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification

Privacy-preserving Universal Adversarial Defense for Black-box Models

Evaluating the Adversarial Robustness of Adaptive Test-time Defenses

MultiRobustBench: Benchmarking Robustness Against Multiple Attacks

Detecting Adversarial Examples Via Key-based Network

NATTACK: Learning the Distributions of Adversarial Examples for an Improved Black-Box Attack on Deep Neural Networks