Abstract:It is becoming increasingly clear that many machine learning classifiers are vulnerable to adversarial examples. In attempting to explain the origin of adversarial examples, previous studies have typically focused on the fact that neural networks operate on high dimensional data, they overfit, or they are too linear. Here we argue that the origin of adversarial examples is primarily due to an inherent uncertainty that neural networks have about their predictions. We show that the functional form of this uncertainty is independent of architecture, dataset, and training protocol; and depends only on the statistics of the logit differences of the network, which do not change significantly during training. This leads to adversarial error having a universal scaling, as a power-law, with respect to the size of the adversarial perturbation. We show that this universality holds for a broad range of datasets (MNIST, CIFAR10, ImageNet, and random data), models (including state-of-the-art deep networks, linear models, adversarially trained networks, and networks trained on randomly shuffled labels), and attacks (FGSM, step l.l., PGD). Motivated by these results, we study the effects of reducing prediction entropy on adversarial robustness. Finally, we study the effect of network architectures on adversarial sensitivity. To do this, we use neural architecture search with reinforcement learning to find adversarially robust architectures on CIFAR10. Our resulting architecture is more robust to white \emph{and} black box attacks compared to previous attempts.

11 adversarial perturbations of deep neural networks

Attack As Defense: Characterizing Adversarial Examples Using Robustness.

Adversarial Examples on Object Recognition: A Comprehensive Survey

A Direct Approach to Robust Deep Learning Using Adversarial Networks

DeepDefense: Training Deep Neural Networks with Improved Robustness.

How adversarial attacks can disrupt seemingly stable accurate classifiers

Adversarial Training: embedding adversarial perturbations into the parameter space of a neural network to build a robust system

Intriguing Properties of Adversarial Examples

Searching for the Essence of Adversarial Perturbations

Adversarial Example Games

Towards Deep Learning Models Resistant to Adversarial Attacks

Adversarial Example Defense via Perturbation Grading Strategy

Attacking Adversarial Attacks as A Defense

Explaining and Harnessing Adversarial Examples

Deep Defense: Training DNNs with Improved Adversarial Robustness

Theoretical Understanding of Learning from Adversarial Perturbations

Towards Robustness against Unsuspicious Adversarial Examples

Adversarial robustness improvement for deep neural networks

Ensemble Adversarial Training: Attacks and Defenses

Adversarial Examples: Attacks and Defenses for Deep Learning