Minimizing Adversarial Training Samples for Robust Image Classifiers: Analysis and Adversarial Example Generator Design

Yulong Wang,Tong Sun,Xin Yuan,Shenghong Li,Wei Ni
DOI: https://doi.org/10.1109/tifs.2024.3474973
IF: 7.231
2024-10-22
IEEE Transactions on Information Forensics and Security
Abstract:Training deep neural networks (DNNs) with altered data, known as adversarial training, is essential for improving their robustness. A significant challenge emerges as the robustness strengthened during training often diminishes during inference, resulting in drops in robust pronounced accuracy. Contemporary strategies either necessitate excessively large training data or risk compromising the natural accuracy of non-adversarial images. Our analysis identifies that the inherent vulnerability of DNNs to adversarial attacks stems from certain input space segments that are inadequately populated by training data, leading to decision-making voids with incorrect predictions. The minimum number of training samples required for successful adversarial training can be attained by maximizing the representativeness of the samples. In light of this, we put forth an advanced training data augmentation method anchored on a Generative Adversarial Network. The generated samples are evaluated by the image classifier during training and selected based on their confidence scores. Evaluations on public datasets, such as Tiny-ImageNet, MS COCO, and CIFAR-100, using various deep neural networks (DNNs), including Vision Transformer, MobileNet, and WideResNet, under recent attacks like DifAttack, SQBA, and AutoAttack, confirm that our method significantly enhances the adversarial robustness of DNN image classifiers. Our method outperforms state-of-the-art adversarial training methods by 35.66% on Tiny-ImageNet, 13.53% on MS COCO, and 13.06% on CIFAR-100.
computer science, theory & methods,engineering, electrical & electronic
What problem does this paper attempt to address?