Abstract:Adversarial examples have been shown to be the severe threat to deep neural networks (DNNs). One of the most effective adversarial defense methods is adversarial training (AT) through minimizing the adversarial risk $R_{adv}$, which encourages both the benign example $x$ and its adversarially perturbed neighborhoods within the $\ell_{p}$-ball to be predicted as the ground-truth label. In this work, we propose a novel defense method, the robust training (RT), by jointly minimizing two separated risks ($R_{stand}$ and $R_{rob}$), which is with respect to the benign example and its neighborhoods respectively. The motivation is to explicitly and jointly enhance the accuracy and the adversarial robustness. We prove that $R_{adv}$ is upper-bounded by $R_{stand} + R_{rob}$, which implies that RT has similar effect as AT. Intuitively, minimizing the standard risk enforces the benign example to be correctly predicted, and the robust risk minimization encourages the predictions of the neighbor examples to be consistent with the prediction of the benign example. Besides, since $R_{rob}$ is independent of the ground-truth label, RT is naturally extended to the semi-supervised mode ($i.e.$, SRT), to further enhance the adversarial robustness. Moreover, we extend the $\ell_{p}$-bounded neighborhood to a general case, which covers different types of perturbations, such as the pixel-wise ($i.e.$, $x + \delta$) or the spatial perturbation ($i.e.$, $ AX + b$). Extensive experiments on benchmark datasets not only verify the superiority of the proposed SRT method to state-of-the-art methods for defensing pixel-wise or spatial perturbations separately, but also demonstrate its robustness to both perturbations simultaneously. The code for reproducing main results is available at \url{<a class="link-external link-https" href="https://github.com/THUYimingLi/Semi-supervised_Robust_Training" rel="external noopener nofollow">this https URL</a>}.

Fairness is Essential for Robustness: Fair Adversarial Training by Identifying and Augmenting Hard Examples

Attack As Defense: Characterizing Adversarial Examples Using Robustness.

To be Robust or to be Fair: Towards Fairness in Adversarial Training

Improving Robust Fairness via Balance Adversarial Training

Hard Adversarial Example Mining for Improving Robust Fairness

Towards Fairness-Aware Adversarial Learning

Boosting Adversarial Training in Safety-Critical Systems Through Boundary Data Selection

To be Robust and to be Fair: Aligning Fairness with Robustness

Feature Augmentation for Adversarial Robustness

FAIR-TAT: Improving Model Fairness Using Targeted Adversarial Training

CFA: Class-wise Calibrated Fair Adversarial Training

RobustFair: Adversarial Evaluation through Fairness Confusion Directed Gradient Search

Enhancing Robust Representation in Adversarial Training: Alignment and Exclusion Criteria

Push Stricter to Decide Better: A Class-Conditional Feature Adaptive Framework for Improving Adversarial Robustness

Adversarial Training with Anti-adversaries

How Robust is your Fair Model? Exploring the Robustness of Diverse Fairness Strategies

Improved Adversarial Training Through Adaptive Instance-wise Loss Smoothing

Learning More Robust Features with Adversarial Training

Toward Adversarial Robustness via Semi-supervised Robust Training

Class aware robust training

Analysis and Applications of Class-wise Robustness in Adversarial Training