A Word-Level Method for Generating Adversarial Examples Using Whole-Sentence Information.

Yufei Liu,Dongmei Zhang,Chunhua Wu,Wei Liu
DOI: https://doi.org/10.1007/978-3-030-88480-2_15
2021-01-01
Abstract:Adversarial examples mislead the deep neural networks (DNNs) by adding slight human-imperceptible perturbations to the input, they reveal the vulnerability of DNNs and can be applied to improve the robustness of the model. Recent work generates adversarial examples by performing word-level substitutions. However, these methods can lead to contextually inappropriate or semantically deviant substitutions because they do not take full advantage of the whole-sentence information and are inefficient in searching. The aim of this study is to improve current methods to enhance the effectiveness of adversarial examples. This study proposes an adversarial example generation method based on an improved application of the masked language model exemplified by BERT. The method injects fuzzy target word information into BERT to predict substitutes by regularizing its token embedding, which empowers BERT to integrate whole-sentence information, and then searches for adversarial examples within the substitute space using beam search with the guidance of word importance. Exhaustive experiments show that it not only significantly outperforms state-of-the-art attack methods, but also has high application value as it can generate fluent and natural samples with minimal perturbation. The work indicates that the method proved to be both effective and efficient in generating adversarial examples.
What problem does this paper attempt to address?