Abstract:Recent research findings suggest that machine learning models are highly susceptible to backdoor poisoning attacks. Backdoor poisoning attacks can be easily executed and achieve high success rates, as the model exhibits anomalous behavior even if a small quantity of malicious data is incorporated into the training dataset. In conventional backdoor defense technologies, fine-tuning is employed as an invasive method that involves adjusting the parameters of model neurons to eliminate backdoors in the attacked model. Nevertheless, this method poses a challenge as the same neurons are responsible for both the original and backdoor tasks, resulting in a decline in the accuracy of the original task during the fine-tuning process. In order to address this issue, we propose a non-invasive approach known as Anti-Backdoor Model (ABM), which does not involve modifying the parameters of the attacked model. ABM employs an external model to counteract the influence of the backdoor task on the attacked model, thereby achieving a balance between eliminating backdoors and preserving the accuracy of the original task. Specifically, our approach involves initially embedding a controllable backdoor in the dataset and leveraging the strong and weak relationships between backdoors to identify a highly concentrated poisoned dataset. Subsequently, we employ the standard training method to train the attacked model (the teacher model). Finally, we utilize this dataset with low volume to train an external model (the student model) that exclusively focuses on backdoors by means of knowledge distillation to counteract the backdoor task in the attacked model (the teacher model). In the experimental part, we assess the effectiveness of ABM by testing eight mainstream attacks on three standard public datasets. Experimental results reveal that ABM exhibits promising efficacy in eliminating the backdoor task while preserving the accuracy of the original task.

Unlearn to Relearn Backdoors: Deferred Backdoor Functionality Attacks on Deep Learning Models

DLP: towards active defense against backdoor attacks with decoupled learning process

Breaking the False Sense of Security in Backdoor Defense through Re-Activation Attack

Unveiling and Mitigating Backdoor Vulnerabilities based on Unlearning Weight Changes and Backdoor Activeness

Anti-Backdoor Learning: Training Clean Models on Poisoned Data

Anti-Backdoor Model: A Novel Algorithm to Remove Backdoors in a Non-invasive Way

A Novel Backdoor Attack Adapted to Transfer Learning.

Backdoor Defense via Decoupling the Training Process

SDBA: A Stealthy and Long-Lasting Durable Backdoor Attack in Federated Learning

Enhanced Coalescence Backdoor Attack Against DNN Based on Pixel Gradient

Mitigating Backdoor Attacks using Activation-Guided Model Editing

Exploiting Machine Unlearning for Backdoor Attacks in Deep Learning System

BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection

Flatness-aware Sequential Learning Generates Resilient Backdoors

Backdoor Defense with Machine Unlearning

Persistent Backdoor Attacks in Continual Learning

Stand-in Backdoor: A Stealthy and Powerful Backdoor Attack

NTD: Non-Transferability Enabled Deep Learning Backdoor Detection

Backdoor Learning: A Survey.

Neurotoxin: Durable Backdoors in Federated Learning

On the Permanence of Backdoors in Evolving Models