Abstract:Adversarial examples have been shown to be the severe threat to deep neural networks (DNNs). One of the most effective adversarial defense methods is adversarial training (AT) through minimizing the adversarial risk $R_{adv}$, which encourages both the benign example $x$ and its adversarially perturbed neighborhoods within the $\ell_{p}$-ball to be predicted as the ground-truth label. In this work, we propose a novel defense method, the robust training (RT), by jointly minimizing two separated risks ($R_{stand}$ and $R_{rob}$), which is with respect to the benign example and its neighborhoods respectively. The motivation is to explicitly and jointly enhance the accuracy and the adversarial robustness. We prove that $R_{adv}$ is upper-bounded by $R_{stand} + R_{rob}$, which implies that RT has similar effect as AT. Intuitively, minimizing the standard risk enforces the benign example to be correctly predicted, and the robust risk minimization encourages the predictions of the neighbor examples to be consistent with the prediction of the benign example. Besides, since $R_{rob}$ is independent of the ground-truth label, RT is naturally extended to the semi-supervised mode ($i.e.$, SRT), to further enhance the adversarial robustness. Moreover, we extend the $\ell_{p}$-bounded neighborhood to a general case, which covers different types of perturbations, such as the pixel-wise ($i.e.$, $x + \delta$) or the spatial perturbation ($i.e.$, $ AX + b$). Extensive experiments on benchmark datasets not only verify the superiority of the proposed SRT method to state-of-the-art methods for defensing pixel-wise or spatial perturbations separately, but also demonstrate its robustness to both perturbations simultaneously. The code for reproducing main results is available at \url{<a class="link-external link-https" href="https://github.com/THUYimingLi/Semi-supervised_Robust_Training" rel="external noopener nofollow">this https URL</a>}.

Robust Training with Feature-Based Adversarial Example

An Adversarial Attack Via Feature Contributive Regions

Feature Augmentation for Adversarial Robustness

Learning More Robust Features with Adversarial Training

ATRA: Efficient Adversarial Training with High-Robust Area

VTFR-AT: Adversarial Training with Visual Transformation and Feature Robustness

An adversarial defense algorithm based on robust U-net

Robust Local Features for Improving the Generalization of Adversarial Training

Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data

Exploring Robust Features for Improving Adversarial Robustness

Weighted Adaptive Perturbations Adversarial Training for Improving Robustness

Toward Adversarial Robustness via Semi-supervised Robust Training

Enhancing Robust Representation in Adversarial Training: Alignment and Exclusion Criteria

Fairness is Essential for Robustness: Fair Adversarial Training by Identifying and Augmenting Hard Examples

Push Stricter to Decide Better: A Class-Conditional Feature Adaptive Framework for Improving Adversarial Robustness

Enhancing Intrinsic Adversarial Robustness via Feature Pyramid Decoder

Towards Adversarial Robustness with Multidimensional Perturbations Via Contrastive Learning

Towards Robust Detection of Adversarial Examples

Towards Robust Training of Neural Networks by Regularizing Adversarial Gradients

Two-Way Feature-Aligned and Attention-Rectified Adversarial Training

FePN: A Robust Feature Purification Network to Defend Against Adversarial Examples.