Abstract:Adversarial examples have been shown to be the severe threat to deep neural networks (DNNs). One of the most effective adversarial defense methods is adversarial training (AT) through minimizing the adversarial risk $R_{adv}$, which encourages both the benign example $x$ and its adversarially perturbed neighborhoods within the $\ell_{p}$-ball to be predicted as the ground-truth label. In this work, we propose a novel defense method, the robust training (RT), by jointly minimizing two separated risks ($R_{stand}$ and $R_{rob}$), which is with respect to the benign example and its neighborhoods respectively. The motivation is to explicitly and jointly enhance the accuracy and the adversarial robustness. We prove that $R_{adv}$ is upper-bounded by $R_{stand} + R_{rob}$, which implies that RT has similar effect as AT. Intuitively, minimizing the standard risk enforces the benign example to be correctly predicted, and the robust risk minimization encourages the predictions of the neighbor examples to be consistent with the prediction of the benign example. Besides, since $R_{rob}$ is independent of the ground-truth label, RT is naturally extended to the semi-supervised mode ($i.e.$, SRT), to further enhance the adversarial robustness. Moreover, we extend the $\ell_{p}$-bounded neighborhood to a general case, which covers different types of perturbations, such as the pixel-wise ($i.e.$, $x + \delta$) or the spatial perturbation ($i.e.$, $ AX + b$). Extensive experiments on benchmark datasets not only verify the superiority of the proposed SRT method to state-of-the-art methods for defensing pixel-wise or spatial perturbations separately, but also demonstrate its robustness to both perturbations simultaneously. The code for reproducing main results is available at \url{<a class="link-external link-https" href="https://github.com/THUYimingLi/Semi-supervised_Robust_Training" rel="external noopener nofollow">this https URL</a>}.

Adversarial Training and Robustness for Multiple Perturbations

Towards Adversarial Robustness with Multidimensional Perturbations Via Contrastive Learning

Constant Random Perturbations Provide Adversarial Robustness with Minimal Effect on Accuracy

Adversarial Training with Anti-adversaries

Towards the first adversarially robust neural network model on MNIST

Ensemble Adversarial Training: Attacks and Defenses

Are Adversarial Robustness and Common Perturbation Robustness Independent Attributes ?

Can Adversarial Training Be Manipulated By Non-Robust Features?

Attacking Adversarial Attacks as A Defense

On the Robustness of Adversarial Training Against Uncertainty Attacks

Deep Repulsive Prototypes for Adversarial Robustness

Splitting the Difference on Adversarial Training

MultiRobustBench: Benchmarking Robustness Against Multiple Attacks

Toward Adversarial Robustness via Semi-supervised Robust Training

Perturbation diversity certificates robust generalization

On the Effect of Adversarial Training Against Invariance-based Adversarial Examples

Towards Robustness against Unsuspicious Adversarial Examples

Adversarially Robust Learning with Unknown Perturbation Sets

Robustness, Privacy, and Generalization of Adversarial Training

Position: Towards Resilience Against Adversarial Examples

Certified Adversarial Robustness Within Multiple Perturbation Bounds