Distilling Knowledge in Adversarial Attack

Zeqian Dong,Long Tang,Cong Tian
DOI: https://doi.org/10.1109/DSA51864.2020.00040
2020-01-01
Abstract:Neural networks show great vulnerability under the threat of adversarial examples. By adding small perturbation to a clean image, neural networks with high classification accuracy can be completely fooled. Transferability which allows adversarial examples to transfer to networks of unknown structures, makes adversarial examples even more harmful. In this paper, we reveal that transferability of adversarial examples is closely related to inter-category information. With that in mind, we propose a simple technique to improve the transferability of adversarial examples. This method makes use of the ideology called knowledge distillation to obtain more information from known structure and datasets. It can be integrated into any gradient methods to generate adversarial examples. We carry out experiments on single, multiple and serialized multiple model scenarios. The results show that knowledge distillation is effective in extracting adversarial information for enhancing transferability.
What problem does this paper attempt to address?