Edit-based Adversarial Text Attack

Ang Li,Xinghao Yang,Weifeng Liu
DOI: https://doi.org/10.1145/3671151.3671285
2024-01-01
Abstract:Adversarial text attack is an effective way to investigate the vulnerability. Recently, several text attack strategies have been proposed. However, the samples produced through word-level and character-level attacks exhibit a certain uniformity, offering limited variation in their structure. Sentence-level attack methods usually generate human unreadable sentences, or are hard to yield a high attack success rate (ASR). In this paper, we propose the Edit-based adversarial text attack (EA). Specifically, we employ three techniques: changing the order, adding definitions, or inserting phrases to manipulate the sentence. To evaluate the effectiveness of the proposed EA method, we conduct extensive experiments on three datasets by attacking several popular models, such as BERT, DistilBERT, CNN, and LSTM etc. Experimental results show that EA enhances the ASR performance compared to word-level and sentence-level baselines while preserving high semantic similarity and incurring minimal perturbation costs. Additionally, EA is helpful in improving the robustness of modern NLP models by retraining.
What problem does this paper attempt to address?