Abstract:Deep neural networks are known to be vulnerable to adversarial attacks (AA). For an image recognition task, this means that a small perturbation of the original can result in the image being misclassified. Design of such attacks as well as methods of adversarial training against them are subject of intense research. We re-cast the problem using techniques of Wasserstein distributionally robust optimization (DRO) and obtain novel contributions leveraging recent insights from DRO sensitivity analysis. We consider a set of distributional threat models. Unlike the traditional pointwise attacks, which assume a uniform bound on perturbation of each input data point, distributional threat models allow attackers to perturb inputs in a non-uniform way. We link these more general attacks with questions of out-of-sample performance and Knightian uncertainty. To evaluate the distributional robustness of neural networks, we propose a first-order AA algorithm and its multi-step version. Our attack algorithms include Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD) as special cases. Furthermore, we provide a new asymptotic estimate of the adversarial accuracy against distributional threat models. The bound is fast to compute and first-order accurate, offering new insights even for the pointwise AA. It also naturally yields out-of-sample performance guarantees. We conduct numerical experiments on the CIFAR-10 dataset using DNNs on RobustBench to illustrate our theoretical results. Our code is available at <a class="link-external link-https" href="https://github.com/JanObloj/W-DRO-Adversarial-Methods" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to improve the robustness of deep neural networks when facing adversarial attacks under distributionally robust optimization (DRO). Specifically, the paper focuses on: 1. **Redefining the adversarial attack problem**: By introducing the Wasserstein distributionally robust optimization (W - DRO) framework, the adversarial attack problem is redefined. Different from the traditional point - to - point attacks, distributionally robust optimization allows attackers to perturb input data in a non - uniform way, thus more realistically simulating the uncertainties and data perturbations in reality. 2. **Proposing new attack methods**: Based on the Wasserstein DRO framework, the paper proposes new adversarial attack algorithms, including single - step and multi - step versions of the fast gradient sign method (FGSM) and projected gradient descent (PGD). These methods not only cover the classic point - to - point attacks but also extend to attacks under the distributional threat model. 3. **Providing theoretical guarantees**: The paper derives a new asymptotic estimate for evaluating the adversarial accuracy of neural networks when facing the distributional threat model. This estimate is fast and first - order accurate, which can provide new insights for point - to - point attacks and naturally provides guarantees for out - of - sample performance. 4. **Experimental verification**: Through numerical experiments on the CIFAR - 10 dataset, the paper verifies the effectiveness of the proposed theoretical results and algorithms. The experimental results show that even for those neural networks that perform well under point - to - point attacks, their adversarial accuracy will significantly decrease when facing distributionally robust attacks. ### Formula Summary 1. **Mathematical form of the Wasserstein DRO problem**: \[ \inf_{\theta \in \Theta} \sup_{Q \in B_\delta(P)} \mathbb{E}_Q[L(f_\theta(x), y)] \] where \( B_\delta(P) \) is the Wasserstein ball centered at \( P \) with radius \( \delta \). 2. **First - order approximation of the adversarial loss**: \[ V(\delta) = V(0) + \delta \Upsilon + o(\delta) \] where: \[ \Upsilon = - \left( \mathbb{E}_P \left[ \|\nabla_x J_\theta(x, y)\|_q^* \right] \right)^{1/q} \] 3. **Definition of adversarial accuracy**: \[ A_\delta = \inf_{Q \in B_\delta(P)} Q(S) = \inf_{Q \in B_\delta(P)} \mathbb{E}_Q[1_S] \] 4. **Asymptotic lower bound of adversarial accuracy**: \[ R_\delta \geq \frac{W(0) - V(\delta)}{W(0) - V(0)} + o(\delta) \] where: \[ W(0) = \sup_{Q \in B_\delta(P)} \mathbb{E}_Q[J_\theta(x, y) \mid S] \] \[ V(0) = \sup_{Q \in B_\delta(P)} \mathbb{E}_Q[J_\theta(x, y) \mid S^c] \] Through these contributions, the paper provides new theoretical and practical tools for improving the robustness of neural networks when facing complex adversarial attacks.

Wasserstein distributional robustness of neural networks

Attack As Defense: Characterizing Adversarial Examples Using Robustness.

A Unified Wasserstein Distributional Robustness Framework for Adversarial Training

Towards Evaluating the Robustness of Neural Networks

Not So Robust After All: Evaluating the Robustness of Deep Neural Networks to Unseen Adversarial Attacks

Towards Robust Neural Networks via Orthogonal Diversity

Local Competition and Uncertainty for Adversarial Robustness in Deep Learning

Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks

Distributed Adversarial Training to Robustify Deep Neural Networks at Scale

DeepDefense: Training Deep Neural Networks with Improved Robustness.

Wasserstein Distributionally Robust Shallow Convex Neural Networks

Evaluating Adversarial Robustness: A Comparison Of FGSM, Carlini-Wagner Attacks, And The Role of Distillation as Defense Mechanism

Regularization for Adversarial Robust Learning

Interpreting and Evaluating Neural Network Robustness

Adversarial Robustness under Long-Tailed Distribution

Subnetwork-Lossless Robust Watermarking for Hostile Theft Attacks in Deep Transfer Learning Models

Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks

A Survey of Neural Network Robustness Assessment in Image Recognition

Feature Denoising for Improving Adversarial Robustness

Adversarial robustness improvement for deep neural networks