Abstract:Adversarial examples have been shown to be the severe threat to deep neural networks (DNNs). One of the most effective adversarial defense methods is adversarial training (AT) through minimizing the adversarial risk $R_{adv}$, which encourages both the benign example $x$ and its adversarially perturbed neighborhoods within the $\ell_{p}$-ball to be predicted as the ground-truth label. In this work, we propose a novel defense method, the robust training (RT), by jointly minimizing two separated risks ($R_{stand}$ and $R_{rob}$), which is with respect to the benign example and its neighborhoods respectively. The motivation is to explicitly and jointly enhance the accuracy and the adversarial robustness. We prove that $R_{adv}$ is upper-bounded by $R_{stand} + R_{rob}$, which implies that RT has similar effect as AT. Intuitively, minimizing the standard risk enforces the benign example to be correctly predicted, and the robust risk minimization encourages the predictions of the neighbor examples to be consistent with the prediction of the benign example. Besides, since $R_{rob}$ is independent of the ground-truth label, RT is naturally extended to the semi-supervised mode ($i.e.$, SRT), to further enhance the adversarial robustness. Moreover, we extend the $\ell_{p}$-bounded neighborhood to a general case, which covers different types of perturbations, such as the pixel-wise ($i.e.$, $x + \delta$) or the spatial perturbation ($i.e.$, $ AX + b$). Extensive experiments on benchmark datasets not only verify the superiority of the proposed SRT method to state-of-the-art methods for defensing pixel-wise or spatial perturbations separately, but also demonstrate its robustness to both perturbations simultaneously. The code for reproducing main results is available at \url{<a class="link-external link-https" href="https://github.com/THUYimingLi/Semi-supervised_Robust_Training" rel="external noopener nofollow">this https URL</a>}.

Improving Adversarial Robustness Requires Revisiting Misclassified Examples.

Attack As Defense: Characterizing Adversarial Examples Using Robustness.

Improving Adversarial Robustness of 3D Point Cloud Classification Models

Improving Model Robustness Against Adversarial Examples with Redundant Fully Connected Layer.

Adaptive Retraining for Neural Network Robustness in Classification

Improving Adversarial Robustness via Attention and Adversarial Logit Pairing

Minimizing Adversarial Training Samples for Robust Image Classifiers: Analysis and Adversarial Example Generator Design

Toward Adversarial Robustness via Semi-supervised Robust Training

Towards Robustness against Unsuspicious Adversarial Examples

Towards Deep Learning Models Resistant to Adversarial Attacks

Adversarial robustness improvement for deep neural networks

Deep Defense: Training DNNs with Improved Adversarial Robustness

Attacking Adversarial Attacks as A Defense

DeepDefense: Training Deep Neural Networks with Improved Robustness.

Class aware robust training

Towards Robust Training of Neural Networks by Regularizing Adversarial Gradients

Towards Robust Detection of Adversarial Examples

Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data

Towards Robustifying Image Classifiers against the Perils of Adversarial Attacks on Artificial Intelligence Systems

WAT: Improve the Worst-class Robustness in Adversarial Training