Weighted Adaptive Perturbations Adversarial Training for Improving Robustness

Yan Wang,Dongmei Zhang,Haiyang Zhang
DOI: https://doi.org/10.1007/978-3-031-20865-2_30
2022-01-01
Abstract:Adversarial Training (AT) is one of the most effective defense methods against adversarial examples, in which a model is trained on both clean and adversarial examples. Although AT improves the robustness by smoothing the small neighborhood, it reduces accuracy on clean examples. We propose Weighted Adaptive Perturbation Adversarial Training (WAPAT) to reduce the loss of clean accuracy and improve robustness, which is motivated by the adaptive learning rate of the model optimizer. In the adversarial examples generation stage of adversarial training, We introduce weights based on feature changes to adaptively adjust the perturbation step size for different features. In iterative attacks, if a feature is frequently attacked, we increase the attack strength of this area, otherwise, we weaken the attack strength of this area. WAPAT is a data augmentation method that shortens the distance of adversarial examples to the classification boundary. The generated adversarial examples maintain good adversarial effects while retaining more clean examples information. Therefore, such adversarial examples can help us to obtain a more robust model while reducing the loss of recognition accuracy for clean examples. To demonstrate our method, we implement WAPAT in three adversarial training frameworks. Experimental results on CIFAR-10 and MNIST show thatWAPAT significantly improves adversarial robustness with less sacrifice of accuracy.
What problem does this paper attempt to address?