Abstract:Adversarial examples have been shown to be a severe threat to deep neural networks (DNNs). One of the most effective adversarial defense methods is adversarial training (AT) through minimizing the adversarial risk R adv , which encourages both the benign example x and its adversarially perturbed neighborhoods within the e p -ball to be predicted as the ground-truth label. In this paper, we propose a novel defense method, the robust training (RT), by jointly minimizing two separated risks ( i.e. , R stand and R rob ), which are with respect to the benign example and its neighborhoods, respectively. The motivation is to explicitly and jointly enhance the accuracy and the adversarial robustness. We prove that R adv is upper-bounded by R stand + R rob , which implies that RT has similar effect as AT. Intuitively, minimizing the standard risk enforces the benign example to be correctly predicted, while the robust risk minimization encourages the predictions of the neighbor examples to be consistent with the prediction of the benign example. Besides, since R rob is independent of the ground-truth label, RT is naturally extended to the semi-supervised mode ( i.e. , SRT), to further enhance its effectiveness. Moreover, we extend the e p -bounded neighborhood to a general case, which covers different types of perturbations, such as the pixel-wise ( i.e. , x + delta) or the spatial perturbation ( i.e. , Ax + b). Extensive experiments on benchmark datasets not only verify the superiority of the proposed SRT to state-of-the-art methods for defending pixel-wise or spatial perturbations separately but also demonstrate its robustness to both perturbations simultaneously. Our work may shed the light on the understanding of universal model robustness and the potential of unlabeled samples. The code for reproducing main results is available at https://github.com/THUYimingLi/Semi-supervised _ Robust _ Training . (c) 2021 Elsevier Ltd. All rights reserved.

Towards Adversarial Robustness with Multidimensional Perturbations Via Contrastive Learning

Attack As Defense: Characterizing Adversarial Examples Using Robustness.

Improving Adversarial Robustness of 3D Point Cloud Classification Models

Adversarial Training and Robustness for Multiple Perturbations

Adversarial Distributional Training for Robust Deep Learning

Feature Augmentation for Adversarial Robustness

Towards Robustness against Unsuspicious Adversarial Examples

Enhancing Robust Representation in Adversarial Training: Alignment and Exclusion Criteria

Toward Adversarial Robustness via Semi-supervised Robust Training

Improving Adversarial Robustness via Attention and Adversarial Logit Pairing

Improving Adversarial Robustness Requires Revisiting Misclassified Examples.

Understanding Adversarial Robustness from Feature Maps of Convolutional Layers

Are Adversarial Robustness and Common Perturbation Robustness Independent Attributes ?

Towards A Unified Min-Max Framework for Adversarial Exploration and Robustness

Strength-Adaptive Adversarial Training

Learning Universal Adversarial Perturbation by Adversarial Example

Towards Improving Robustness Against Common Corruptions in Object Detectors Using Adversarial Contrastive Learning

Towards the first adversarially robust neural network model on MNIST

CAT: Customized Adversarial Training for Improved Robustness

Attacking Adversarial Attacks as A Defense

Semi-supervised Robust Training with Generalized Perturbed Neighborhood