Abstract:Data-poisoning backdoor attacks are serious security threats to machine learning models, where an adversary can manipulate the training dataset to inject backdoors into models. In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned. Unlike most existing methods that primarily detect and remove/unlearn suspicious samples to mitigate malicious backdoor attacks, we propose a novel defense approach called PDB (Proactive Defensive Backdoor). Specifically, PDB leverages the home-field advantage of defenders by proactively injecting a defensive backdoor into the model during training. Taking advantage of controlling the training process, the defensive backdoor is designed to suppress the malicious backdoor effectively while remaining secret to attackers. In addition, we introduce a reversible mapping to determine the defensive target label. During inference, PDB embeds a defensive trigger in the inputs and reverses the model's prediction, suppressing malicious backdoor and ensuring the model's utility on the original task. Experimental results across various datasets and models demonstrate that our approach achieves state-of-the-art defense performance against a wide range of backdoor attacks. The code is available at <a class="link-external link-https" href="https://github.com/shawkui/Proactive_Defensive_Backdoor" rel="external noopener nofollow">this https URL</a>.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to defend against data - poisoned backdoor attacks during the training process of machine - learning models. Specifically, the author focuses on preventing backdoors from being injected into the model during the training process, even if the training data set may have been contaminated. Traditional defense methods mainly focus on detecting and removing suspicious samples, while this paper proposes a new defense strategy - PDB (Proactive Defensive Backdoor), that is, by actively injecting a defensive backdoor into the model to combat malicious backdoors, thus effectively suppressing the influence of malicious backdoors while maintaining the utility of the model for the original task.
### Main contributions of the paper
1. **Innovative defense mechanism**: A new mechanism is proposed to combat malicious backdoors by injecting an actively - defending backdoor during the training process. This method does not require specific assumptions about potential malicious backdoor attacks.
2. **Design principles**: The main objectives of effectively defending against backdoors are analyzed, and four key design principles are proposed: reversibility, inaccessibility to attackers, minimal impact on model performance, and resistance to other backdoors.
3. **Experimental verification**: Through extensive experiments, it is compared with five of the latest in - training defense methods, covering seven state - of - the - art data - poisoned backdoor attack methods, involving different model structures and data sets. The experimental results show that PDB performs well or outperforms existing baseline methods in various attack scenarios.
### Main technical details
- **Problem setting**: Samples \(x\in X\) and labels \(y\in Y\) are defined, where \(Y = [1,\cdots,K]\) represents the candidate label space and \(X\) represents the sample space. Backdoor attacks are injected through a trigger \(\Delta\), generating poisoned samples \(x\oplus\Delta\).
- **Threat model**: A data - poisoned scenario is considered. The attacker can only manipulate part of the training data set to implant triggers, but cannot control the training process. The defender faces a potentially contaminated data set and aims to train a model so that it does not activate malicious backdoors in the presence of malicious triggers and maintains the utility for the original task.
- **Defense backdoor design**:
- **Reversibility**: The defense backdoor must be reversible in order to recover the real label from the prediction of a benign sample with a defense trigger.
- **Inaccessibility to attackers**: The defense trigger \(\Delta_1\) should be carefully designed so that it is not reproducible and difficult for attackers to discover.
- **Minimal impact on model performance**: The modified samples should retain enough features of the original data to ensure accurate label recovery in the presence of the defense trigger.
- **Resistance to other backdoors**: The defense backdoor should be able to resist various backdoor attacks, including known and potential future backdoors.
- **Backdoor injection**: By constructing a defensive poisoned data set \(\hat{D}_{\text{def}}\) and combining it with the malicious poisoned data set \(D_{\text{tr}}\) for model training, effective defense against malicious backdoors is finally achieved.
### Experimental results
- **CIFAR - 10 data set**: PDB performs excellently in multiple attack scenarios, especially in reducing the attack success rate (ASR) and maintaining a high defense effectiveness rating (DER).
- **Tiny ImageNet data set**: PDB also performs well on the ViT - B - 16 model, especially being able to effectively defend against multiple backdoor attacks even at a high poisoning ratio.
### Conclusion
By actively injecting a defensive backdoor, PDB not only effectively resists multiple backdoor attacks but also performs well in maintaining the utility of the model for the original task. This method provides new ideas and directions for future backdoor defense research.