Abstract:Adversarial attacks expose the vulnerability of deep neural networks. Compared to image adversarial attacks, textual adversarial attacks are more challenging due to the discrete nature of texts. Recent synonym‐based methods achieve the current state‐of‐the‐art results. However, these methods introduce new words against the original text, leading to that humans easily perceive the difference between the adversarial example and the original text. Motivated by the fact that humans are usually unaware of chaotic word order in some cases, we propose exchange‐attack (EA), a concise and effective word‐level textual adversarial attack model. Specifically, the EA model generates adversarial examples by exchanging words of the original text itself according to the contributions that these words make regarding classification results. Intuitively, the smaller the distance between the two exchanged words, the more difficult the chaotic word order to be perceived by humans. We thus take the word distance into consideration when generating the chaotic word orders. Extensive experiments on several text classification data sets show that the EA model consistently outperforms the selected baselines in terms of averaged after‐attack accuracy, modification rate, query number, and semantic similarity. And human evaluation results reveal that humans difficultly perceive the adversarial examples generated by the EA model. In addition, quantitative and qualitative analyses further validate the effectiveness of the EA model, including that the generated adversarial examples are grammatically correct and semantically preserved.

Efficiently generating sentence-level textual adversarial examples with Seq2seq Stacked Auto-Encoder

Misleading Sentiment Analysis: Generating Adversarial Texts by the Ensemble Word Addition Algorithm

SSCAE -- Semantic, Syntactic, and Context-aware natural language Adversarial Examples generator

Generating Natural Language Adversarial Examples on a Large Scale with Generative Models

A Semantic, Syntactic, And Context-Aware Natural Language Adversarial Example Generator

Generating natural adversarial examples with universal perturbations for text classification

Textual adversarial attacks by exchanging text‐self words

Precisely the Point: Adversarial Augmentations for Faithful and Informative Text Generation

BESA: BERT-based Simulated Annealing for Adversarial Text Attacks.

Sustainable Self-evolution Adversarial Training

AdvExpander: Generating Natural Language Adversarial Examples by Expanding Text

Textual Adversarial Attack As Combinatorial Optimization

Generating Fluent Adversarial Examples for Natural Languages.

SemanticAdv: Generating Adversarial Examples via Attribute-conditional Image Editing

A Reinforced Generation of Adversarial Examples for Neural Machine Translation

Word-level Textual Adversarial Attacking as Combinatorial Optimization

Natural Language Adversarial Defense through Synonym Encoding

Generating Watermarked Adversarial Texts

Towards Improving Adversarial Training of NLP Models

Preserving Semantics in Textual Adversarial Attacks

Generating Fluent Chinese Adversarial Examples for Sentiment Classification