Abstract:Data-poisoning backdoor attacks are serious security threats to machine learning models, where an adversary can manipulate the training dataset to inject backdoors into models. In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned. Unlike most existing methods that primarily detect and remove/unlearn suspicious samples to mitigate malicious backdoor attacks, we propose a novel defense approach called PDB (Proactive Defensive Backdoor). Specifically, PDB leverages the home-field advantage of defenders by proactively injecting a defensive backdoor into the model during training. Taking advantage of controlling the training process, the defensive backdoor is designed to suppress the malicious backdoor effectively while remaining secret to attackers. In addition, we introduce a reversible mapping to determine the defensive target label. During inference, PDB embeds a defensive trigger in the inputs and reverses the model's prediction, suppressing malicious backdoor and ensuring the model's utility on the original task. Experimental results across various datasets and models demonstrate that our approach achieves state-of-the-art defense performance against a wide range of backdoor attacks. The code is available at <a class="link-external link-https" href="https://github.com/shawkui/Proactive_Defensive_Backdoor" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to defend against data - poisoned backdoor attacks during the training process of machine - learning models. Specifically, the author focuses on preventing backdoors from being injected into the model during the training process, even if the training data set may have been contaminated. Traditional defense methods mainly focus on detecting and removing suspicious samples, while this paper proposes a new defense strategy - PDB (Proactive Defensive Backdoor), that is, by actively injecting a defensive backdoor into the model to combat malicious backdoors, thus effectively suppressing the influence of malicious backdoors while maintaining the utility of the model for the original task. ### Main contributions of the paper 1. **Innovative defense mechanism**: A new mechanism is proposed to combat malicious backdoors by injecting an actively - defending backdoor during the training process. This method does not require specific assumptions about potential malicious backdoor attacks. 2. **Design principles**: The main objectives of effectively defending against backdoors are analyzed, and four key design principles are proposed: reversibility, inaccessibility to attackers, minimal impact on model performance, and resistance to other backdoors. 3. **Experimental verification**: Through extensive experiments, it is compared with five of the latest in - training defense methods, covering seven state - of - the - art data - poisoned backdoor attack methods, involving different model structures and data sets. The experimental results show that PDB performs well or outperforms existing baseline methods in various attack scenarios. ### Main technical details - **Problem setting**: Samples \(x\in X\) and labels \(y\in Y\) are defined, where \(Y = [1,\cdots,K]\) represents the candidate label space and \(X\) represents the sample space. Backdoor attacks are injected through a trigger \(\Delta\), generating poisoned samples \(x\oplus\Delta\). - **Threat model**: A data - poisoned scenario is considered. The attacker can only manipulate part of the training data set to implant triggers, but cannot control the training process. The defender faces a potentially contaminated data set and aims to train a model so that it does not activate malicious backdoors in the presence of malicious triggers and maintains the utility for the original task. - **Defense backdoor design**: - **Reversibility**: The defense backdoor must be reversible in order to recover the real label from the prediction of a benign sample with a defense trigger. - **Inaccessibility to attackers**: The defense trigger \(\Delta_1\) should be carefully designed so that it is not reproducible and difficult for attackers to discover. - **Minimal impact on model performance**: The modified samples should retain enough features of the original data to ensure accurate label recovery in the presence of the defense trigger. - **Resistance to other backdoors**: The defense backdoor should be able to resist various backdoor attacks, including known and potential future backdoors. - **Backdoor injection**: By constructing a defensive poisoned data set \(\hat{D}_{\text{def}}\) and combining it with the malicious poisoned data set \(D_{\text{tr}}\) for model training, effective defense against malicious backdoors is finally achieved. ### Experimental results - **CIFAR - 10 data set**: PDB performs excellently in multiple attack scenarios, especially in reducing the attack success rate (ASR) and maintaining a high defense effectiveness rating (DER). - **Tiny ImageNet data set**: PDB also performs well on the ViT - B - 16 model, especially being able to effectively defend against multiple backdoor attacks even at a high poisoning ratio. ### Conclusion By actively injecting a defensive backdoor, PDB not only effectively resists multiple backdoor attacks but also performs well in maintaining the utility of the model for the original task. This method provides new ideas and directions for future backdoor defense research.

Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor

B3: Backdoor Attacks Against Black-box Machine Learning Models

Towards A Proactive ML Approach for Detecting Backdoor Poison Samples

Backdoor Mitigation by Distance-Driven Detoxification

DLP: towards active defense against backdoor attacks with decoupled learning process

Backdoor Defense via Adaptively Splitting Poisoned Dataset

Anti-Backdoor Learning: Training Clean Models on Poisoned Data

Beating Backdoor Attack at Its Own Game

Anti-Backdoor Model: A Novel Algorithm to Remove Backdoors in a Non-invasive Way

Backdoor Defense via Decoupling the Training Process

Effective Backdoor Defense by Exploiting Sensitivity of Poisoned Samples

Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

Progressive Poisoned Data Isolation for Training-time Backdoor Defense

Stand-in Backdoor: A Stealthy and Powerful Backdoor Attack

SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks

PBP: Post-training Backdoor Purification for Malware Classifiers

Poison as a Cure: Detecting & Neutralizing Variable-Sized Backdoor Attacks in Deep Neural Networks

Mellivora Capensis: A Backdoor-Free Training Framework on the Poisoned Dataset without Auxiliary Data

Boosting Backdoor Attack with A Learnable Poisoning Sample Selection Strategy

AdvDoor: Adversarial Backdoor Attack of Deep Learning System

Reverse Engineering Imperceptible Backdoor Attacks on Deep Neural Networks for Detection and Training Set Cleansing