GRIP-GAN: an Attack-Free Defense Through General Robust Inverse Perturbation

Haibin Zheng,Jinyin Chen,Hang Du,Weipeng Zhu,Shouling Ji,Xuhong Zhang
DOI: https://doi.org/10.1109/tdsc.2021.3124337
2022-01-01
Abstract:Despite of its tremendous popularity and success in computer vision (CV) and natural language processing, deep learning is inherently vulnerable to adversarial attacks in which adversarial examples (AEs) are carefully crafted by imposing imperceptible perturbations on the clean examples to deceive the target deep neural networks (DNNs). Many defense solutions in CV have been proposed. However, most of them, e.g., adversarial training, suffer from a low generality due to the reliance on limited AEs. Moreover, some solutions even have a non-negligible negative impact on the classification accuracy of clean examples. Last but not least, they are impotent against the unconstrained attacks in which the attackers optimize the perturbation direction and size by additionally taking the defense methods into accounts. In this article, we propose GRIP-GAN to learn a general robust inverse perturbation (GRIP), which is not only able to offset any potential adversarial perturbations but also strengthen the target class-related features, purely from the clean images via a generative adversarial network (GAN). By feeding a random noise, GRIP-GAN is able to generate a dynamic GRIP for each input image to defend against unconstrained attacks. To further improve the defense performance, we also enable GRIP-GAN to generate a GRIP tailored to each input image via feeding input image specific noise to GRIP-GAN. Extensive experiments are carried out on MNIST, CIFAR10, and ImageNet datasets against 17 adversarial attacks. The results show that GRIP-GAN outperforms all the baselines. We further share insights on the success of GRIP-GAN and provide visualized proofs.
What problem does this paper attempt to address?