Abstract:Recently, deep learning based natural language processing techniques are being extensively used to deal with spam mail, censorship evaluation in social networks, among others. However, there is only a couple of works evaluating the vulnerabilities of such deep neural networks. Here, we go beyond attacks to investigate, for the first time, universal rules, i.e., rules that are sample agnostic and therefore could turn any text sample in an adversarial one. In fact, the universal rules do not use any information from the method itself (no information from the method, gradient information or training dataset information is used), making them black-box universal attacks. In other words, the universal rules are sample and method agnostic. By proposing a coevolutionary optimization algorithm we show that it is possible to create universal rules that can automatically craft imperceptible adversarial samples (only less than five perturbations which are close to misspelling are inserted in the text sample). A comparison with a random search algorithm further justifies the strength of the method. Thus, universal rules for fooling networks are here shown to exist. Hopefully, the results from this work will impact the development of yet more sample and model agnostic attacks as well as their defenses, culminating in perhaps a new age for artificial intelligence.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **Explore and propose universal rules that can deceive text classification models based on deep neural networks (DNN)**. Specifically, these universal rules can automatically create adversarial samples without relying on specific samples, methods or training data, thereby causing DNN to misclassify text. ### Core of the Problem 1. **Limitations of Existing Attack Methods**: - **Dependence on Gradient Information**: Many existing adversarial attack methods rely on the gradient information of the target model, but in practical applications, this information is often unavailable. - **Dependence on Training Data and Model Structure**: Some methods require access to the training data set or knowledge of the specific structure of the model, which is impossible in black - box attack scenarios. - **Complexity of Adversarial Sample Generation**: Existing methods usually require complex search processes to generate adversarial samples, and these processes may not be applicable to all scenarios. 2. **Requirement for Universal Rules**: - **Sample - Independence**: Universal rules should be applicable to any text sample without the need for individual adjustment for each sample. - **Method - Independence**: Universal rules should not depend on specific deep - learning models or their internal structures. - **Black - Box Attack Capability**: Universal rules should be effective without access to the internal information of the target model. ### Solutions The paper proposes two methods to generate these universal rules: 1. **Random Search (RS)**: - Randomly generate perturbation rules and select the best - performing rule as output. 2. **Coevolutionary Algorithm for Universal Rule Optimization (CAURO)**: - Through co - evolution, gradually optimize the combination of perturbation rules to find more effective universal rules. ### Main Contributions - **Existence of Universal Rules**: Prove the existence of universal rules that can deceive state - of - the - art text classifiers. - **Efficient Generation Method**: Propose CAURO, which is the first time that the co - evolution algorithm has been applied to the field of adversarial machine learning. - **Effectiveness of Black - Box Attacks**: Demonstrate that adversarial samples can still be successfully generated without relying on any information of the target model or training data. ### Experimental Results The experimental results show that the universal rules generated by CAURO reached a misclassification rate of 38.67% after 100 generations, much higher than that of the random search method (9.29%). This indicates that through appropriate optimization methods, the effectiveness of universal rules can be significantly improved. ### Conclusion This research not only reveals the potential vulnerabilities of DNN text classifiers, but also provides new ideas and technical means for future adversarial attacks and defenses.

Universal Rules for Fooling Deep Neural Networks based Text Classification

Training NLI Models Through Universal Adversarial Attack

Fooling Neural Network Interpretations - Adversarial Noise to Attack Images.

Generating Universal Language Adversarial Examples by Understanding and Enhancing the Transferability Across Neural Models

A Black-box NLP Classifier Attacker

A Universal Defense Strategy Against Adversarial Attacks Based on Attention-Guided

Generating natural adversarial examples with universal perturbations for text classification

On the Transferability of Adversarial Attacksagainst Neural Text Classifier

Universal Adversarial Perturbation for Text Classification

Improving the Reliability of Deep Neural Networks in NLP: A Review

AdvFoolGen: Creating Persistent Troubles for Deep Classifiers

TREATED:Towards Universal Defense against Textual Adversarial Attacks

Text Laundering: Mitigating Malicious Features Through Knowledge Distillation of Large Foundation Models.

Exploring the Vulnerability of Natural Language Processing Models via Universal Adversarial Texts

Bypassing DARCY Defense: Indistinguishable Universal Adversarial Triggers

DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks

TransFool: An Adversarial Attack against Neural Machine Translation Models

Towards a Robust Deep Neural Network Against Adversarial Texts: A Survey.

Towards a Robust Deep Neural Network in Texts: A Survey

Towards Deep Learning Models Resistant to Adversarial Attacks

TextJuggler: Fooling Text Classification Tasks by Generating High-Quality Adversarial Examples