Class-Balanced Universal Perturbations for Adversarial Training

Kexue Ma,Guitao Cao,Mengqian Xu,Chunwei Wu,Hong Wang,Wenming Cao
DOI: https://doi.org/10.1109/IJCNN54540.2023.10191447
2023-01-01
Abstract:Universal attack generates image-agnostic perturbation called universal adversarial perturbation (UAP), which can be added to all samples in the data distribution to fool the classifier. However, a universal perturbation will likely mislead the classifier to identify most adversarial examples as the same label, resulting in the imbalance of attack strength between classes. In this paper, we propose class-balanced UAPs that enlarge the dispersion of the predicted labels for adversarial examples. To ensure attack strength and balance simultaneously, we design a novel diversity objective containing probability calibration and penalty regularizer, which fully considers the predicted label distribution between samples and the predicted probability distribution within samples. Furthermore, we apply class-balanced attacks in adversarial training to defend against universal perturbations since the class-balanced UAP provides diverse perturbation directions. We correspondingly reformulate adversarial training from the min-max optimization problem into a new two-stage framework. Experiments on several benchmark datasets demonstrate that the class-balanced attack achieves better performance than the universal attack, while adversarial training with class-balanced UAP achieves state-of-the-art results in clean accuracy and robustness to universal perturbations.
What problem does this paper attempt to address?