Abstract:Deep neural networks are vulnerable to adversarial attacks, despite performing well in a variety of tasks. In the current black-box word-level text adversarial attacks on various classification tasks, the main problems are the relatively low success rate and the need to improve the quality of the adversarial examples generated. These problems mainly involve two aspects: first, the key to effectively conducting adversarial attacks is accurately determining the key words in a sentence that significantly affect the model’s judgment. Only by precisely finding these words can the attack be effectively performed. Second, to generate high-quality adversarial examples, it is essential to mislead the classification model while minimizing changes to words in the sentence. It is essential to ensure that adversarial examples are as semantically and grammatically similar to the original samples as possible. Therefore, accurately determining key words and minimally altering them to produce high-quality adversarial examples presents a significant challenge. To address these challenges, we introduce TextJuggler, a new black-box word-level text adversarial attack method, inspired by occlusion and language modeling concepts. By using the Bert model to sample and replace words in sentences, the key words that influence classifier decisions can be efficiently determined. To ensure efficiency in the search for key words, our method reduces queries via crafted locality-sensitive hashing. For the determined key words, we adopt the robust and optimized Bert model, to generate high-quality adversarial examples through insertion or substitution operations for different text classification tasks while ensuring semantic similarity and text fluency. Extensive experiments and API experiments show that TextJuggler outperforms the baselines in attack success rate, textual similarity, and fluency.

Generating More Effective and Imperceptible Adversarial Text Examples for Sentiment Classification

Misleading Sentiment Analysis: Generating Adversarial Texts by the Ensemble Word Addition Algorithm

An Adversarial Attack Via Feature Contributive Regions

Generating Fluent Chinese Adversarial Examples for Sentiment Classification

Generating Natural Language Adversarial Examples Through Probability Weighted Word Saliency

Generating Natural Language Adversarial Examples on a Large Scale with Generative Models

TextTricker: Loss-based and gradient-based adversarial attacks on text classification models

WordChange: Adversarial Examples Generation Approach for Chinese Text Classification

Textual Adversarial Attack As Combinatorial Optimization

On Adversarial Examples for Text Classification by Perturbing Latent Representations

Identifying Adversarial Attacks on Text Classifiers

Textual adversarial attacks by exchanging text‐self words

Word-level Textual Adversarial Attacking as Combinatorial Optimization

Towards Variable-Length Textual Adversarial Attacks

TextJuggler: Fooling Text Classification Tasks by Generating High-Quality Adversarial Examples

A Black-box NLP Classifier Attacker

TextCheater: A Query-Efficient Textual Adversarial Attack in the Hard-Label Setting

An adversarial-example generation method for Chinese sentiment tendency classification based on audiovisual confusion and contextual association

Open the Boxes of Words: Incorporating Sememes into Textual Adversarial Attack

Detecting textual adversarial examples through text modification on text classification systems