Toward Universal Detection of Adversarial Examples Via Pseudorandom Classifiers

Boyu Zhu,Changyu Dong,Yuan Zhang,Yunlong Mao,Sheng Zhong
DOI: https://doi.org/10.1109/tifs.2023.3340889
IF: 7.231
2024-01-01
IEEE Transactions on Information Forensics and Security
Abstract:Adversarial examples that can fool neural network classifiers have attracted much attention. Existing approaches to detect adversarial examples leverage a supervised scheme in generating attacks (either targeted or non-targeted) for training the detectors, which means the detectors are geared to the attacks chosen at the training time and could be circumvented if the adversary does not act as expected. In this paper, we borrow ideas from cryptography and present a novel approach called pseudorandom classifier. In a nutshell, a pseudorandom classifier is a classifier equipped with a mapping to encode the category labels into random multi-bit labels, and a keyed pseudorandom injective function to transform the input to the classifier. The multi-bit labels enable attack-independent and probabilistic detection if the input sample is adversarial. The pseudorandom injection makes the existing white-box adversarial example generation methods, largely based on back-propagation, no longer applicable. We empirically evaluate our method on MNIST, CIFAR10, Imagenette, CIFAR100, and GTSRB. The results suggest that its performance against adversarial examples is comparable to the state-of-the-art.
What problem does this paper attempt to address?