Wasserstein distributional robustness of neural networks

Xingjian Bai,Guangyi He,Yifan Jiang,Jan Obloj
2023-06-16
Abstract:Deep neural networks are known to be vulnerable to adversarial attacks (AA). For an image recognition task, this means that a small perturbation of the original can result in the image being misclassified. Design of such attacks as well as methods of adversarial training against them are subject of intense research. We re-cast the problem using techniques of Wasserstein distributionally robust optimization (DRO) and obtain novel contributions leveraging recent insights from DRO sensitivity analysis. We consider a set of distributional threat models. Unlike the traditional pointwise attacks, which assume a uniform bound on perturbation of each input data point, distributional threat models allow attackers to perturb inputs in a non-uniform way. We link these more general attacks with questions of out-of-sample performance and Knightian uncertainty. To evaluate the distributional robustness of neural networks, we propose a first-order AA algorithm and its multi-step version. Our attack algorithms include Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD) as special cases. Furthermore, we provide a new asymptotic estimate of the adversarial accuracy against distributional threat models. The bound is fast to compute and first-order accurate, offering new insights even for the pointwise AA. It also naturally yields out-of-sample performance guarantees. We conduct numerical experiments on the CIFAR-10 dataset using DNNs on RobustBench to illustrate our theoretical results. Our code is available at <a class="link-external link-https" href="https://github.com/JanObloj/W-DRO-Adversarial-Methods" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Computer Vision and Pattern Recognition,Optimization and Control,Probability
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to improve the robustness of deep neural networks when facing adversarial attacks under distributionally robust optimization (DRO). Specifically, the paper focuses on: 1. **Redefining the adversarial attack problem**: By introducing the Wasserstein distributionally robust optimization (W - DRO) framework, the adversarial attack problem is redefined. Different from the traditional point - to - point attacks, distributionally robust optimization allows attackers to perturb input data in a non - uniform way, thus more realistically simulating the uncertainties and data perturbations in reality. 2. **Proposing new attack methods**: Based on the Wasserstein DRO framework, the paper proposes new adversarial attack algorithms, including single - step and multi - step versions of the fast gradient sign method (FGSM) and projected gradient descent (PGD). These methods not only cover the classic point - to - point attacks but also extend to attacks under the distributional threat model. 3. **Providing theoretical guarantees**: The paper derives a new asymptotic estimate for evaluating the adversarial accuracy of neural networks when facing the distributional threat model. This estimate is fast and first - order accurate, which can provide new insights for point - to - point attacks and naturally provides guarantees for out - of - sample performance. 4. **Experimental verification**: Through numerical experiments on the CIFAR - 10 dataset, the paper verifies the effectiveness of the proposed theoretical results and algorithms. The experimental results show that even for those neural networks that perform well under point - to - point attacks, their adversarial accuracy will significantly decrease when facing distributionally robust attacks. ### Formula Summary 1. **Mathematical form of the Wasserstein DRO problem**: \[ \inf_{\theta \in \Theta} \sup_{Q \in B_\delta(P)} \mathbb{E}_Q[L(f_\theta(x), y)] \] where \( B_\delta(P) \) is the Wasserstein ball centered at \( P \) with radius \( \delta \). 2. **First - order approximation of the adversarial loss**: \[ V(\delta) = V(0) + \delta \Upsilon + o(\delta) \] where: \[ \Upsilon = - \left( \mathbb{E}_P \left[ \|\nabla_x J_\theta(x, y)\|_q^* \right] \right)^{1/q} \] 3. **Definition of adversarial accuracy**: \[ A_\delta = \inf_{Q \in B_\delta(P)} Q(S) = \inf_{Q \in B_\delta(P)} \mathbb{E}_Q[1_S] \] 4. **Asymptotic lower bound of adversarial accuracy**: \[ R_\delta \geq \frac{W(0) - V(\delta)}{W(0) - V(0)} + o(\delta) \] where: \[ W(0) = \sup_{Q \in B_\delta(P)} \mathbb{E}_Q[J_\theta(x, y) \mid S] \] \[ V(0) = \sup_{Q \in B_\delta(P)} \mathbb{E}_Q[J_\theta(x, y) \mid S^c] \] Through these contributions, the paper provides new theoretical and practical tools for improving the robustness of neural networks when facing complex adversarial attacks.