Poisoning-based Backdoor Attacks for Arbitrary Target Label with Positive Triggers

Binxiao Huang,Jason Chun Lok,Chang Liu,Ngai Wong
2024-05-09
Abstract:Poisoning-based backdoor attacks expose vulnerabilities in the data preparation stage of deep neural network (DNN) training. The DNNs trained on the poisoned dataset will be embedded with a backdoor, making them behave well on clean data while outputting malicious predictions whenever a trigger is applied. To exploit the abundant information contained in the input data to output label mapping, our scheme utilizes the network trained from the clean dataset as a trigger generator to produce poisons that significantly raise the success rate of backdoor attacks versus conventional approaches. Specifically, we provide a new categorization of triggers inspired by the adversarial technique and develop a multi-label and multi-payload Poisoning-based backdoor attack with Positive Triggers (PPT), which effectively moves the input closer to the target label on benign classifiers. After the classifier is trained on the poisoned dataset, we can generate an input-label-aware trigger to make the infected classifier predict any given input to any target label with a high possibility. Under both dirty- and clean-label settings, we show empirically that the proposed attack achieves a high attack success rate without sacrificing accuracy across various datasets, including SVHN, CIFAR10, GTSRB, and Tiny ImageNet. Furthermore, the PPT attack can elude a variety of classical backdoor defenses, proving its effectiveness.
Computer Vision and Pattern Recognition,Cryptography and Security
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to conduct backdoor attacks on deep neural networks (DNN) through data poisoning in the data preparation stage, especially multi - label and multi - payload backdoor attacks against arbitrary target labels. Traditional backdoor attack methods usually need to control the entire training process, which is often difficult to achieve in practical applications. This paper proposes a new data - poisoning - based backdoor attack method (PPT) using positive triggers. This method can achieve high - success - rate attacks on any input to any target label by only poisoning part of the clean data set without interfering with the training process, and can effectively evade multiple classic backdoor defense mechanisms while maintaining the accuracy of the model on clean data. Specifically, the main contributions of the paper include: 1. **Proposing a new trigger classification**: Triggers are divided into positive triggers, neutral triggers and negative triggers. Positive triggers can increase the prediction score of the target category, while negative triggers can reduce the prediction score of the target category, and neutral triggers have no significant impact on the classification result. These triggers can flexibly manipulate the classification results of the network. 2. **Achieving multi - label and multi - payload backdoor attacks**: Compared with existing methods, PPT only needs to poison part of the clean data set, and can cover both dirty - label and clean - label attacks at the same time, with lower requirements and higher attack success rates. 3. **Experimentally verifying the effectiveness and robustness of PPT**: A large number of experiments were carried out on multiple data sets (including SVHN, CIFAR10, GTSRB and Tiny ImageNet). The results show that PPT not only has high accuracy on clean data, but also can misclassify the input to any specified target label with high probability in the presence of triggers, and can effectively resist multiple popular backdoor defense methods. In conclusion, this paper proposes a new data - poisoning - based backdoor attack method PPT by introducing the concept of positive triggers, solves the problem of achieving multi - label and multi - payload backdoor attacks without controlling the training process, and verifies its effectiveness and robustness through experiments.