Abstract:Deep neural networks are vulnerable to adversarial attacks, despite performing well in a variety of tasks. In the current black-box word-level text adversarial attacks on various classification tasks, the main problems are the relatively low success rate and the need to improve the quality of the adversarial examples generated. These problems mainly involve two aspects: first, the key to effectively conducting adversarial attacks is accurately determining the key words in a sentence that significantly affect the model’s judgment. Only by precisely finding these words can the attack be effectively performed. Second, to generate high-quality adversarial examples, it is essential to mislead the classification model while minimizing changes to words in the sentence. It is essential to ensure that adversarial examples are as semantically and grammatically similar to the original samples as possible. Therefore, accurately determining key words and minimally altering them to produce high-quality adversarial examples presents a significant challenge. To address these challenges, we introduce TextJuggler, a new black-box word-level text adversarial attack method, inspired by occlusion and language modeling concepts. By using the Bert model to sample and replace words in sentences, the key words that influence classifier decisions can be efficiently determined. To ensure efficiency in the search for key words, our method reduces queries via crafted locality-sensitive hashing. For the determined key words, we adopt the robust and optimized Bert model, to generate high-quality adversarial examples through insertion or substitution operations for different text classification tasks while ensuring semantic similarity and text fluency. Extensive experiments and API experiments show that TextJuggler outperforms the baselines in attack success rate, textual similarity, and fluency.

A Word-Level Method for Generating Adversarial Examples Using Whole-Sentence Information.

Misleading Sentiment Analysis: Generating Adversarial Texts by the Ensemble Word Addition Algorithm

BERT-ATTACK: Adversarial Attack Against BERT Using BERT

Towards Improving Adversarial Training of NLP Models

Generating Natural Language Adversarial Examples Through Probability Weighted Word Saliency

Chinese adversarial examples generation approach with multi-strategy based on semantic

TextJuggler: Fooling Text Classification Tasks by Generating High-Quality Adversarial Examples

Generating Natural Language Adversarial Examples on a Large Scale with Generative Models

Generating Adversarial Examples in Chinese Texts Using Sentence-Pieces

Generating Valid and Natural Adversarial Examples with Large Language Models

AdvExpander: Generating Natural Language Adversarial Examples by Expanding Text

BESA: BERT-based Simulated Annealing for Adversarial Text Attacks.

Frauds Bargain Attack: Generating Adversarial Text Samples via Word Manipulation Process

Word-level textual adversarial attacking based on genetic algorithm

Rule-based adversarial sample generation for text classification

Generating Fluent Chinese Adversarial Examples for Sentiment Classification

Generating Watermarked Adversarial Texts

Adversarial Examples with Difficult Common Words for Paraphrase Identification

Adversarial Training for Improving Model Robustness? Look at Both Prediction and Interpretation

Self-Supervised Contrastive Learning with Adversarial Perturbations for Defending Word Substitution-based Attacks

Gradient-Based Word Substitution for Obstinate Adversarial Examples Generation in Language Models