Abstract:Adversarial training is one of the most effective approaches to improve model robustness against adversarial examples. However, previous works mainly focus on the overall robustness of the model, and the in-depth analysis on the role of each class involved in adversarial training is still missing. In this paper, we propose to analyze the class-wise robustness in adversarial training. First, we provide a detailed diagnosis of adversarial training on six benchmark datasets, i.e., MNIST, CIFAR-10, CIFAR-100, SVHN, STL-10 and ImageNet. Surprisingly, we find that there are remarkable robustness discrepancies among classes, leading to unbalance/unfair class-wise robustness in the robust models. Furthermore, we keep investigating the relations between classes and find that the unbalanced class-wise robustness is pretty consistent among different attack and defense methods. Moreover, we observe that the stronger attack methods in adversarial learning achieve performance improvement mainly from a more successful attack on the vulnerable classes (i.e., classes with less robustness). Inspired by these interesting findings, we design a simple but effective attack method based on the traditional PGD attack, named Temperature-PGD attack, which proposes to enlarge the robustness disparity among classes with a temperature factor on the confidence distribution of each image. Experiments demonstrate our method can achieve a higher attack rate than the PGD attack. Furthermore, from the defense perspective, we also make some modifications in the training and inference phase to improve the robustness of the most vulnerable class, so as to mitigate the large difference in class-wise robustness. We believe our work can contribute to a more comprehensive understanding of adversarial training as well as rethinking the class-wise properties in robust models.

Impact of Attention on Adversarial Robustness of Image Classification Models

Impact of White-Box Adversarial Attacks on Convolutional Neural Networks

Strengthening Robustness Under Adversarial Attacks Using Brain Visual Codes

Associative Adversarial Learning Based on Selective Attack

A Comprehensive Study on Robustness of Image Classification Models: Benchmarking and Rethinking

Evaluating Adversarial Robustness on Document Image Classification

Targeted Black-Box Adversarial Attack Method for Image Classification Models.

Attention, Please! Adversarial Defense via Activation Rectification and Preservation

Improving Adversarial Robustness of 3D Point Cloud Classification Models

Robust Superpixel-Guided Attentional Adversarial Attack

Benchmarking Adversarial Robustness on Image Classification

Evaluating Adversarial Robustness: A Comparison Of FGSM, Carlini-Wagner Attacks, And The Role of Distillation as Defense Mechanism

A Simplified Heuristic Version of Raviv's Algorithm for Using Context in Text Recognition

Towards Robustness against Unsuspicious Adversarial Examples

Not So Robust After All: Evaluating the Robustness of Deep Neural Networks to Unseen Adversarial Attacks

Robustness and Transferability of Adversarial Attacks on Different Image Classification Neural Networks

Analysis and Applications of Class-wise Robustness in Adversarial Training

Towards Class-wise Robustness Analysis

A Survey of Neural Network Robustness Assessment in Image Recognition

Give Me Your Attention: Dot-Product Attention Considered Harmful for Adversarial Patch Robustness

AICAttack: Adversarial Image Captioning Attack with Attention-Based Optimization