Abstract:Deep neural networks are vulnerable to adversarial attacks, despite performing well in a variety of tasks. In the current black-box word-level text adversarial attacks on various classification tasks, the main problems are the relatively low success rate and the need to improve the quality of the adversarial examples generated. These problems mainly involve two aspects: first, the key to effectively conducting adversarial attacks is accurately determining the key words in a sentence that significantly affect the model’s judgment. Only by precisely finding these words can the attack be effectively performed. Second, to generate high-quality adversarial examples, it is essential to mislead the classification model while minimizing changes to words in the sentence. It is essential to ensure that adversarial examples are as semantically and grammatically similar to the original samples as possible. Therefore, accurately determining key words and minimally altering them to produce high-quality adversarial examples presents a significant challenge. To address these challenges, we introduce TextJuggler, a new black-box word-level text adversarial attack method, inspired by occlusion and language modeling concepts. By using the Bert model to sample and replace words in sentences, the key words that influence classifier decisions can be efficiently determined. To ensure efficiency in the search for key words, our method reduces queries via crafted locality-sensitive hashing. For the determined key words, we adopt the robust and optimized Bert model, to generate high-quality adversarial examples through insertion or substitution operations for different text classification tasks while ensuring semantic similarity and text fluency. Extensive experiments and API experiments show that TextJuggler outperforms the baselines in attack success rate, textual similarity, and fluency.

BESA: BERT-based Simulated Annealing for Adversarial Text Attacks.

Misleading Sentiment Analysis: Generating Adversarial Texts by the Ensemble Word Addition Algorithm

BERT-ATTACK: Adversarial Attack Against BERT Using BERT

TextCheater: A Query-Efficient Textual Adversarial Attack in the Hard-Label Setting

Textual Adversarial Attack As Combinatorial Optimization

Word-level Textual Adversarial Attacking as Combinatorial Optimization

BFS2Adv: Black-Box Adversarial Attack Towards Hard-to-Attack Short Texts

SemAttack: Natural Textual Attacks via Different Semantic Spaces

Semantic Stealth: Adversarial Text Attacks on NLP Using Several Methods

Towards Improving Adversarial Training of NLP Models

SSCAE -- Semantic, Syntactic, and Context-aware natural language Adversarial Examples generator

A Semantic, Syntactic, And Context-Aware Natural Language Adversarial Example Generator

Semantic-Preserving Adversarial Text Attacks

TextJuggler: Fooling Text Classification Tasks by Generating High-Quality Adversarial Examples

Generating Natural Language Adversarial Examples Through Probability Weighted Word Saliency

Open the Boxes of Words: Incorporating Sememes into Textual Adversarial Attack

Arabic Synonym BERT-based Adversarial Examples for Text Classification

Learning to Attack: Towards Textual Adversarial Attacking in Real-world Situations

Textual adversarial attacks by exchanging text‐self words

Efficiently generating sentence-level textual adversarial examples with Seq2seq Stacked Auto-Encoder

Chinese adversarial examples generation approach with multi-strategy based on semantic