Textual Backdoor Attack Via Keyword Positioning

Depeng Chen,Fangfang Mao,Hulin Jin,Jie Cui
DOI: https://doi.org/10.1007/978-981-97-5609-4_5
2024-01-01
Abstract:The backdoor problem poses a potential threat to the security of neural networks. Although backdoor attacks have been extensively studied in the field of CV, they cannot be directly applied to the field of NLP due to the discrete nature of data characteristics. Data poisoning attacks are a common strategy in backdoor attacks in the NLP field, such as replacing or inserting triggers (for example, rare words) into sentences. However, most of the previous work was to randomly select the location of the trigger to be replaced or inserted, and inserting rare words can cause abnormal natural language expression and can be easily detected. In response to the above problems, this paper proposes a textual back door attack technique based on keyword positioning. Keywords usually calculate the importance score of each word or word with a specific part of speech and find the most vulnerable words in the sentence, that is, the keywords that help the target model make judgments. Therefore, interference with these words often makes the target model make errors in judgment. In this article, we first calculate the importance score and part-of-speech label of each word in the sentence, then select the trigger word based on the false correlation between the single word and the target label, and finally perturb the position of the keyword. We conducted experiments on four text classification data sets, and the results showed that the attack we proposed can not only ensure the concealment of the trigger in most cases but also has a better attack than the baseline solution.
What problem does this paper attempt to address?