A backdoor attack against LSTM-based text classification systems

Jiazhu Dai,Chuanshuai Chen

DOI: https://doi.org/10.48550/arXiv.1905.12457

2019-06-04

Abstract:With the widespread use of deep learning system in many applications, the adversary has strong incentive to explore vulnerabilities of deep neural networks and manipulate them. Backdoor attacks against deep neural networks have been reported to be a new type of threat. In this attack, the adversary will inject backdoors into the model and then cause the misbehavior of the model through inputs including backdoor triggers. Existed research mainly focuses on backdoor attacks in image classification based on CNN, little attention has been paid to the backdoor attacks in RNN. In this paper, we implement a backdoor attack in text classification based on LSTM by data poisoning. When the backdoor is injected, the model will misclassify any text samples that contains a specific trigger sentence into the target category determined by the adversary. The existence of the backdoor trigger is stealthy and the backdoor injected has little impact on the performance of the model. We consider the backdoor attack in black-box setting where the adversary has no knowledge of model structures or training algorithms except for small amount of training data. We verify the attack through sentiment analysis on the dataset of IMDB movie reviews. The experimental results indicate that our attack can achieve around 95% success rate with 1% poisoning rate.

Cryptography and Security

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to carry out backdoor attacks in the long - short - term memory network (LSTM) - based text classification system. Specifically, the researchers focus on how to inject backdoors into the LSTM model through data poisoning, so that when the input contains a specific backdoor trigger, the model will misclassify any text sample into the target category specified by the attacker. This backdoor attack is covert and has little impact on the performance of the model on clean data. The paper particularly emphasizes attacks in the black - box setting, that is, the attacker has no other knowledge of the model structure or training algorithm except for a small amount of training data. The main contributions of the paper include: 1. A black - box backdoor attack method for LSTM - based text classification systems has been implemented, in which the attacker has limited knowledge of the model structure or training algorithm. 2. A random insertion strategy is used to generate poisoned samples, so that the backdoor trigger can be placed at any semantically correct position in the text, thereby achieving the concealment of the trigger. 3. The proposed method is efficient and easy to implement, and a high attack success rate can be achieved with only a small number of poisoned samples and a small loss of model performance. Experimental results show that this method can achieve an attack success rate of about 95% at a 1% data poisoning rate, while having almost no impact on the classification accuracy of the model on the test set.

A backdoor attack against LSTM-based text classification systems

B3: Backdoor Attacks Against Black-box Machine Learning Models

Clean-image Backdoor: Attacking Multi-label Models with Poisoned Labels Only

Textual Backdoor Attacks Can Be More Harmful via Two Simple Tricks

A Black-box NLP Classifier Attacker

Hidden Trigger Backdoor Attack on NLP Models via Linguistic Style Manipulation

BadPre: Task-agnostic Backdoor Attacks to Pre-trained NLP Foundation Models

AdvDoor: Adversarial Backdoor Attack of Deep Learning System

Kallima: A Clean-Label Framework for Textual Backdoor Attacks

Exploiting the Vulnerability of Large Language Models via Defense-Aware Architectural Backdoor

Backdoor Learning on Sequence to Sequence Models

Universal backdoor attack on deep neural networks for malware detection

A Practical Trigger-Free Backdoor Attack on Neural Networks

Temporal-Distributed Backdoor Attack Against Video Based Action Recognition

An Effective and Resilient Backdoor Attack Framework against Deep Neural Networks and Vision Transformers

Beating Backdoor Attack at Its Own Game

Triggerless Backdoor Attack for NLP Tasks with Clean Labels

Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger

Towards Backdoor Attack on Deep Learning Based Time Series Classification

Regula Sub-rosa: Latent Backdoor Attacks on Deep Neural Networks

BDDR: An Effective Defense Against Textual Backdoor Attacks