Effective and Imperceptible Adversarial Textual Attack via Multi-objectivization

Shengcai Liu,Ning Lu,Wenjing Hong,Chao Qian,Ke Tang
2023-12-15
Abstract:The field of adversarial textual attack has significantly grown over the last few years, where the commonly considered objective is to craft adversarial examples (AEs) that can successfully fool the target model. However, the imperceptibility of attacks, which is also essential for practical attackers, is often left out by previous studies. In consequence, the crafted AEs tend to have obvious structural and semantic differences from the original human-written text, making them easily perceptible. In this work, we advocate leveraging multi-objectivization to address such issue. Specifically, we reformulate the problem of crafting AEs as a multi-objective optimization problem, where the attack imperceptibility is considered as an auxiliary objective. Then, we propose a simple yet effective evolutionary algorithm, dubbed HydraText, to solve this problem. To the best of our knowledge, HydraText is currently the only approach that can be effectively applied to both score-based and decision-based attack settings. Exhaustive experiments involving 44237 instances demonstrate that HydraText consistently achieves competitive attack success rates and better attack imperceptibility than the recently proposed attack approaches. A human evaluation study also shows that the AEs crafted by HydraText are more indistinguishable from human-written text. Finally, these AEs exhibit good transferability and can bring notable robustness improvement to the target model by adversarial training.
Computation and Language,Neural and Evolutionary Computing
What problem does this paper attempt to address?
The paper attempts to address the problem of generating adversarial examples (AEs) that are both effective and imperceptible in text adversarial attacks. Specifically, most existing research focuses only on successfully deceiving the target model, neglecting the imperceptibility of the attack. This means that the generated adversarial examples have noticeable structural and semantic differences from the original human-written text, making them easy to detect or recognize by humans. To tackle this challenge, the authors propose redefining the problem of generating adversarial examples as a Multi-Objective Optimization Problem (MOP), which not only considers the success rate of the attack but also introduces an auxiliary objective to measure the imperceptibility of the attack. They then propose a simple yet effective evolutionary algorithm called HydraText to solve this multi-objective optimization problem. HydraText significantly improves the imperceptibility of the attack without sacrificing the success rate and is applicable to both score-based and decision-based attack settings. Experimental results show that HydraText performs excellently on multiple datasets, generating adversarial examples that are more difficult to distinguish and have good transferability, and can enhance the robustness of the target model through adversarial training. Additionally, HydraText is currently the only method that can be effectively applied to both attack settings.