Abstract:Deep neural networks are vulnerable to adversarial attacks, despite performing well in a variety of tasks. In the current black-box word-level text adversarial attacks on various classification tasks, the main problems are the relatively low success rate and the need to improve the quality of the adversarial examples generated. These problems mainly involve two aspects: first, the key to effectively conducting adversarial attacks is accurately determining the key words in a sentence that significantly affect the model’s judgment. Only by precisely finding these words can the attack be effectively performed. Second, to generate high-quality adversarial examples, it is essential to mislead the classification model while minimizing changes to words in the sentence. It is essential to ensure that adversarial examples are as semantically and grammatically similar to the original samples as possible. Therefore, accurately determining key words and minimally altering them to produce high-quality adversarial examples presents a significant challenge. To address these challenges, we introduce TextJuggler, a new black-box word-level text adversarial attack method, inspired by occlusion and language modeling concepts. By using the Bert model to sample and replace words in sentences, the key words that influence classifier decisions can be efficiently determined. To ensure efficiency in the search for key words, our method reduces queries via crafted locality-sensitive hashing. For the determined key words, we adopt the robust and optimized Bert model, to generate high-quality adversarial examples through insertion or substitution operations for different text classification tasks while ensuring semantic similarity and text fluency. Extensive experiments and API experiments show that TextJuggler outperforms the baselines in attack success rate, textual similarity, and fluency.

Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial Examples

Searching for Textual Adversarial Examples with Learned Strategy.

BeamAttack: Generating High-quality Textual Adversarial Examples Through Beam Search and Mixed Semantic Spaces

Understanding and Benchmarking the Commonality of Adversarial Examples

Learning to Generate Textual Adversarial Examples

BufferSearch: Generating Black-Box Adversarial Texts With Lower Queries

Textual Adversarial Attack As Combinatorial Optimization

Improving Query Efficiency of Black-box Adversarial Attack

TextCheater: A Query-Efficient Textual Adversarial Attack in the Hard-Label Setting

BlackboxBench: A Comprehensive Benchmark of Black-box Adversarial Attacks

Searching for an Effective Defender: Benchmarking Defense Against Adversarial Word Substitution

Word-level Textual Adversarial Attacking as Combinatorial Optimization

Generation-based Parallel Particle Swarm Optimization for Adversarial Text Attacks

Black-box Word-level Textual Adversarial Attack Based On Discrete Harris Hawks Optimization.

Towards Improving Adversarial Training of NLP Models

ADSAttack: an Adversarial Attack Algorithm Via Searching Adversarial Distribution in Latent Space

Frauds Bargain Attack: Generating Adversarial Text Samples via Word Manipulation Process

Detection of Word Adversarial Examples in Text Classification: Benchmark and Baseline via Robust Density Estimation

Chinese adversarial examples generation approach with multi-strategy based on semantic

BFS2Adv: Black-Box Adversarial Attack Towards Hard-to-Attack Short Texts

TextJuggler: Fooling Text Classification Tasks by Generating High-Quality Adversarial Examples