Robust Training with Feature-Based Adversarial Example

Xuanming Fu,Zhengfeng Yang,Hao Xue,Jianlin Wang,Zhenbing Zeng
DOI: https://doi.org/10.1109/icpr56361.2022.9956608
2022-01-01
Abstract:Adversarial training is an efficacious defense approach to protect classification model against adversarial attacks. In this paper, we reveal that a significant difference exists between the feature map of the original sample and that of its corresponding adversarial version. Based on this main insight, we propose a novel robust training on feature-based adversarial examples approach called FPAT, where training examples are generated by maximizing the loss function between the clean and the adversarial feature maps. We show via extensive experiments on MNIST, SVHN and CIFAR-10, that our proposed method is as effective as the state-of-the-art robust training methods. Especially, when the adversarial perturbation is of a large radius or the number of adversarial steps of training samples is small, FPAT achieves leading robustness.
What problem does this paper attempt to address?