Abstract:The adversarial vulnerability of neural nets, and subsequent techniques to create robust models have attracted significant attention; yet we still lack a full understanding of this phenomenon. Here, we study adversarial examples of trained neural networks through analytical tools afforded by recent theory advances connecting neural networks and kernel methods, namely the Neural Tangent Kernel (NTK), following a growing body of work that leverages the NTK approximation to successfully analyze important deep learning phenomena and design algorithms for new applications. We show how NTKs allow to generate adversarial examples in a ``training-free'' fashion, and demonstrate that they transfer to fool their finite-width neural net counterparts in the ``lazy'' regime. We leverage this connection to provide an alternative view on robust and non-robust features, which have been suggested to underlie the adversarial brittleness of neural nets. Specifically, we define and study features induced by the eigendecomposition of the kernel to better understand the role of robust and non-robust features, the reliance on both for standard classification and the robustness-accuracy trade-off. We find that such features are surprisingly consistent across architectures, and that robust features tend to correspond to the largest eigenvalues of the model, and thus are learned early during training. Our framework allows us to identify and visualize non-robust yet useful features. Finally, we shed light on the robustness mechanism underlying adversarial training of neural nets used in practice: quantifying the evolution of the associated empirical NTK, we demonstrate that its dynamics falls much earlier into the ``lazy'' regime and manifests a much stronger form of the well known bias to prioritize learning features within the top eigenspaces of the kernel, compared to standard training.

Convergence of Adversarial Training in Overparametrized Neural Networks

L G ] 1 9 Ju n 20 19 Convergence of Adversarial Training in Overparametrized Networks

Towards Robust DNNs: an Taylor Expansion-Based Method for Generating Powerful Adversarial Examples.

Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality

Over-parameterization and Adversarial Robustness in Neural Networks: An Overview and Empirical Analysis

What Can the Neural Tangent Kernel Tell Us About Adversarial Robustness?

Evolution of Neural Tangent Kernels under Benign and Adversarial Training

A constrained optimization approach to improve robustness of neural networks

The Surprising Harmfulness of Benign Overfitting for Adversarial Robustness

Adversarial Training: embedding adversarial perturbations into the parameter space of a neural network to build a robust system

Towards Robust Training of Neural Networks by Regularizing Adversarial Gradients

Can overfitted deep neural networks in adversarial training generalize? -- An approximation viewpoint

The curse of overparametrization in adversarial training: Precise analysis of robust generalization for random features regression

Adaptive Retraining for Neural Network Robustness in Classification

Minimax rates of convergence for nonparametric regression under adversarial attacks

Interpolated Adversarial Training: Achieving robust neural networks without sacrificing too much accuracy

Adversarial Coreset Selection for Efficient Robust Training

On adversarial training and the 1 Nearest Neighbor classifier

Adversarial Robustness Vs. Model Compression, or Both?

Adversarial Training Should Be Cast as a Non-Zero-Sum Game