LST2A: Lexical-Syntactic Targeted Adversarial Attack for Texts
Guanghao Zhou,Panjia Qiu,Mingyuan Fan,Cen Chen,Yaliang Li,Wenmeng Zhou
DOI: https://doi.org/10.1145/3627673.3679640
2024-01-01
Abstract:Textual adversarial attack in black-box scenarios is a challenging task, as only the predicted label is available, and the text space is discrete and non-differentiable. Current research in this area is still in its infancy and mostly focuses on untargeted attack, lacking the capability to control the labels of the generated adversarial examples. Meanwhile, existing textual adversarial attack methods primarily rely on word substitution operations to maintain semantic similarity between the adversarial and original examples, which greatly limits the search space for adversarial examples. To address these issues, we propose a novel Lexical-Syntactic Targeted Adversarial Attack method tailored for the black-box settings, referred to as LST2A. Our approach involves adversarial perturbations at different levels of granularities, i.e., word-level with word substitution operations and syntactic-level through rewriting the syntax of the examples. Specifically, we first embed the entire text into the embedding layer of a masked language model, and then optimize perturbations at the word level within the hidden state to generate adversarial examples with the target label. For examples that are difficult to attack successfully with only word-level perturbations at higher semantic similarity thresholds, we leverage Large Language Model (LLM) to introduce syntactic-level perturbations to these examples, making them more vulnerable to the decision boundary of the victim model. Subsequently, we re-optimize the word-level perturbations for these vulnerable examples. Extensive experiments and human evaluations demonstrate that our proposed method consistently outperforms the state-of-the-art baselines, crafting smoother, more grammatically correct adversarial examples.