Abstract:Deep neural networks are vulnerable to adversarial attacks, despite performing well in a variety of tasks. In the current black-box word-level text adversarial attacks on various classification tasks, the main problems are the relatively low success rate and the need to improve the quality of the adversarial examples generated. These problems mainly involve two aspects: first, the key to effectively conducting adversarial attacks is accurately determining the key words in a sentence that significantly affect the model’s judgment. Only by precisely finding these words can the attack be effectively performed. Second, to generate high-quality adversarial examples, it is essential to mislead the classification model while minimizing changes to words in the sentence. It is essential to ensure that adversarial examples are as semantically and grammatically similar to the original samples as possible. Therefore, accurately determining key words and minimally altering them to produce high-quality adversarial examples presents a significant challenge. To address these challenges, we introduce TextJuggler, a new black-box word-level text adversarial attack method, inspired by occlusion and language modeling concepts. By using the Bert model to sample and replace words in sentences, the key words that influence classifier decisions can be efficiently determined. To ensure efficiency in the search for key words, our method reduces queries via crafted locality-sensitive hashing. For the determined key words, we adopt the robust and optimized Bert model, to generate high-quality adversarial examples through insertion or substitution operations for different text classification tasks while ensuring semantic similarity and text fluency. Extensive experiments and API experiments show that TextJuggler outperforms the baselines in attack success rate, textual similarity, and fluency.

Generating Natural Language Adversarial Examples Based on the Approximating Top-K Combination Token Substitution.

Training NLI Models Through Universal Adversarial Attack

Misleading Sentiment Analysis: Generating Adversarial Texts by the Ensemble Word Addition Algorithm

Generating Natural Language Adversarial Examples on a Large Scale with Generative Models

Generating Natural Language Adversarial Examples Through Probability Weighted Word Saliency

Chinese adversarial examples generation approach with multi-strategy based on semantic

Textual Adversarial Attack As Combinatorial Optimization

Towards Improving Adversarial Training of NLP Models

Generating Universal Language Adversarial Examples by Understanding and Enhancing the Transferability Across Neural Models

Generating Adversarial Examples in Chinese Texts Using Sentence-Pieces

TextJuggler: Fooling Text Classification Tasks by Generating High-Quality Adversarial Examples

Towards a Robust Deep Neural Network in Texts: A Survey

Towards a Robust Deep Neural Network Against Adversarial Texts: A Survey.

Generating natural adversarial examples with universal perturbations for text classification

Word-level Textual Adversarial Attacking as Combinatorial Optimization

Generating Watermarked Adversarial Texts

MPAT: Building Robust Deep Neural Networks against Textual Adversarial Attacks

Natural Language Induced Adversarial Images

WordChange: Adversarial Examples Generation Approach for Chinese Text Classification

AdvExpander: Generating Natural Language Adversarial Examples by Expanding Text