Towards Deep Learning Models Resistant to Adversarial Attacks

Aleksander Madry,Aleksandar Makelov,Ludwig Schmidt,Dimitris Tsipras,Adrian Vladu
2019-09-05
Abstract:Recent work has demonstrated that deep neural networks are vulnerable to adversarial examples---inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal. In particular, they specify a concrete security guarantee that would protect against any adversary. These methods let us train networks with significantly improved resistance to a wide range of adversarial attacks. They also suggest the notion of security against a first-order adversary as a natural and broad security guarantee. We believe that robustness against such well-defined classes of adversaries is an important stepping stone towards fully resistant deep learning models. Code and pre-trained models are available at <a class="link-external link-https" href="https://github.com/MadryLab/mnist_challenge" rel="external noopener nofollow">this https URL</a> and <a class="link-external link-https" href="https://github.com/MadryLab/cifar10_challenge" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Neural and Evolutionary Computing
What problem does this paper attempt to address?
The problem this paper attempts to address is the vulnerability of deep neural networks to adversarial examples. Specifically, these adversarial examples are inputs that are almost indistinguishable from natural data but are misclassified by the network. The paper points out that the existence of adversarial attacks may be an inherent weakness of deep learning models. To tackle this issue, the authors study the adversarial robustness of neural networks from the perspective of robust optimization. This approach not only provides a broad and unified perspective to review previous work but also makes it possible to identify reliable and universal methods for training and attacking neural networks. In particular, these methods can provide a concrete guarantee of being able to withstand any type of adversary. Through these methods, the authors are able to train networks with significant resistance to various adversarial attacks and propose a concept of security against a well-defined class of adversaries, considering it an important step towards achieving fully resistant deep learning models. In short, the core question of the paper is: how to train deep neural networks that can resist adversarial inputs? By introducing the perspective of robust optimization, the authors propose a new method to enhance the adversarial robustness of models, thereby addressing this issue.