Universal Rules for Fooling Deep Neural Networks based Text Classification

Di Li,Danilo Vasconcellos Vargas,Sakurai Kouichi
DOI: https://doi.org/10.48550/arXiv.1901.07132
2019-04-03
Abstract:Recently, deep learning based natural language processing techniques are being extensively used to deal with spam mail, censorship evaluation in social networks, among others. However, there is only a couple of works evaluating the vulnerabilities of such deep neural networks. Here, we go beyond attacks to investigate, for the first time, universal rules, i.e., rules that are sample agnostic and therefore could turn any text sample in an adversarial one. In fact, the universal rules do not use any information from the method itself (no information from the method, gradient information or training dataset information is used), making them black-box universal attacks. In other words, the universal rules are sample and method agnostic. By proposing a coevolutionary optimization algorithm we show that it is possible to create universal rules that can automatically craft imperceptible adversarial samples (only less than five perturbations which are close to misspelling are inserted in the text sample). A comparison with a random search algorithm further justifies the strength of the method. Thus, universal rules for fooling networks are here shown to exist. Hopefully, the results from this work will impact the development of yet more sample and model agnostic attacks as well as their defenses, culminating in perhaps a new age for artificial intelligence.
Machine Learning,Computation and Language,Cryptography and Security
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **Explore and propose universal rules that can deceive text classification models based on deep neural networks (DNN)**. Specifically, these universal rules can automatically create adversarial samples without relying on specific samples, methods or training data, thereby causing DNN to misclassify text. ### Core of the Problem 1. **Limitations of Existing Attack Methods**: - **Dependence on Gradient Information**: Many existing adversarial attack methods rely on the gradient information of the target model, but in practical applications, this information is often unavailable. - **Dependence on Training Data and Model Structure**: Some methods require access to the training data set or knowledge of the specific structure of the model, which is impossible in black - box attack scenarios. - **Complexity of Adversarial Sample Generation**: Existing methods usually require complex search processes to generate adversarial samples, and these processes may not be applicable to all scenarios. 2. **Requirement for Universal Rules**: - **Sample - Independence**: Universal rules should be applicable to any text sample without the need for individual adjustment for each sample. - **Method - Independence**: Universal rules should not depend on specific deep - learning models or their internal structures. - **Black - Box Attack Capability**: Universal rules should be effective without access to the internal information of the target model. ### Solutions The paper proposes two methods to generate these universal rules: 1. **Random Search (RS)**: - Randomly generate perturbation rules and select the best - performing rule as output. 2. **Coevolutionary Algorithm for Universal Rule Optimization (CAURO)**: - Through co - evolution, gradually optimize the combination of perturbation rules to find more effective universal rules. ### Main Contributions - **Existence of Universal Rules**: Prove the existence of universal rules that can deceive state - of - the - art text classifiers. - **Efficient Generation Method**: Propose CAURO, which is the first time that the co - evolution algorithm has been applied to the field of adversarial machine learning. - **Effectiveness of Black - Box Attacks**: Demonstrate that adversarial samples can still be successfully generated without relying on any information of the target model or training data. ### Experimental Results The experimental results show that the universal rules generated by CAURO reached a misclassification rate of 38.67% after 100 generations, much higher than that of the random search method (9.29%). This indicates that through appropriate optimization methods, the effectiveness of universal rules can be significantly improved. ### Conclusion This research not only reveals the potential vulnerabilities of DNN text classifiers, but also provides new ideas and technical means for future adversarial attacks and defenses.