Abstract:In social media, neural network models have been applied to hate speech detection, sentiment analysis, etc., but neural network models are susceptible to adversarial attacks. For instance, in a text classification task, the attacker elaborately introduces perturbations to the original texts that hardly alter the original semantics in order to trick the model into making different predictions. By studying textual adversarial attack methods, the robustness of language models can be evaluated and then improved. Currently, most of the research in this field focuses on English, and there is also a certain amount of research on Chinese. However, there is little research targeting Chinese minority languages. With the rapid development of artificial intelligence technology and the emergence of Chinese minority language models, textual adversarial attacks become a new challenge for the information processing of Chinese minority languages. In response to this situation, we propose a multi-granularity Tibetan textual adversarial attack method based on masked language models called TSTricker. We utilize the masked language models to generate candidate substitution syllables or words, adopt the scoring mechanism to determine the substitution order, and then conduct the attack method on several fine-tuned victim models. The experimental results show that TSTricker reduces the accuracy of the classification models by more than 28.70% and makes the classification models change the predictions of more than 90.60% of the samples, which has an evidently higher attack effect than the baseline method.

A Simple Yet Efficient Method for Adversarial Word-Substitute Attack

Towards Efficient Data Free Blackbox Adversarial Attack

A Black-box NLP Classifier Attacker

FastTextDodger: Decision-Based Adversarial Attack Against Black-Box NLP Models With Extremely High Efficiency

Searching for an Effective Defender: Benchmarking Defense Against Adversarial Word Substitution

Improving Query Efficiency of Black-box Adversarial Attack

Defense of Word-level Adversarial Attacks via Random Substitution Encoding

Query-Efficient Adversarial Attack with Low Perturbation Against End-to-End Speech Recognition Systems

TextCheater: A Query-Efficient Textual Adversarial Attack in the Hard-Label Setting

Textual Adversarial Attack As Combinatorial Optimization

Towards Improving Adversarial Training of NLP Models

FastWordBug: A Fast Method To Generate Adversarial Text Against NLP Applications

Deceiving Question-Answering Models: A Hybrid Word-Level Adversarial Approach

Bigram and Unigram Based Text Attack Via Adaptive Monotonic Heuristic Search

Defense Against Syntactic Textual Backdoor Attacks with Token Substitution

BufferSearch: Generating Black-Box Adversarial Texts With Lower Queries

Word-level Textual Adversarial Attacking as Combinatorial Optimization

Single Word Change is All You Need: Designing Attacks and Defenses for Text Classifiers

Multi-Granularity Tibetan Textual Adversarial Attack Method Based on Masked Language Model

Learning to Attack: Towards Textual Adversarial Attacking in Real-world Situations

A Practical Black-Box Attack Against Autonomous Speech Recognition Model