Abstract:Recent research findings suggest that machine learning models are highly susceptible to backdoor poisoning attacks. Backdoor poisoning attacks can be easily executed and achieve high success rates, as the model exhibits anomalous behavior even if a small quantity of malicious data is incorporated into the training dataset. In conventional backdoor defense technologies, fine-tuning is employed as an invasive method that involves adjusting the parameters of model neurons to eliminate backdoors in the attacked model. Nevertheless, this method poses a challenge as the same neurons are responsible for both the original and backdoor tasks, resulting in a decline in the accuracy of the original task during the fine-tuning process. In order to address this issue, we propose a non-invasive approach known as Anti-Backdoor Model (ABM), which does not involve modifying the parameters of the attacked model. ABM employs an external model to counteract the influence of the backdoor task on the attacked model, thereby achieving a balance between eliminating backdoors and preserving the accuracy of the original task. Specifically, our approach involves initially embedding a controllable backdoor in the dataset and leveraging the strong and weak relationships between backdoors to identify a highly concentrated poisoned dataset. Subsequently, we employ the standard training method to train the attacked model (the teacher model). Finally, we utilize this dataset with low volume to train an external model (the student model) that exclusively focuses on backdoors by means of knowledge distillation to counteract the backdoor task in the attacked model (the teacher model). In the experimental part, we assess the effectiveness of ABM by testing eight mainstream attacks on three standard public datasets. Experimental results reveal that ABM exhibits promising efficacy in eliminating the backdoor task while preserving the accuracy of the original task.

Flatness-aware Sequential Learning Generates Resilient Backdoors

Backdoor Defense via Decoupling the Training Process

Unlearn to Relearn Backdoors: Deferred Backdoor Functionality Attacks on Deep Learning Models

Enhancing Fine-Tuning Based Backdoor Defense with Sharpness-Aware Minimization

Persistent Backdoor Attacks in Continual Learning

Neurotoxin: Durable Backdoors in Federated Learning

Anti-Backdoor Learning: Training Clean Models on Poisoned Data

Model-Contrastive Learning for Backdoor Elimination

DeCE: Deceptive Cross-Entropy Loss Designed for Defending Backdoor Attacks

Unveiling and Mitigating Backdoor Vulnerabilities based on Unlearning Weight Changes and Backdoor Activeness

Efficient Backdoor Removal Through Natural Gradient Fine-tuning

Simulate and Eliminate: Revoke Backdoors for Generative Large Language Models

Fine-Tuning Is All You Need to Mitigate Backdoor Attacks

DLP: towards active defense against backdoor attacks with decoupled learning process

Backdoor Defense Via Deconfounded Representation Learning

BadSFL: Backdoor Attack against Scaffold Federated Learning

SDBA: A Stealthy and Long-Lasting Durable Backdoor Attack in Federated Learning

Anti-Backdoor Model: A Novel Algorithm to Remove Backdoors in a Non-invasive Way

Circumventing Backdoor Defenses That Are Based on Latent Separability

Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats

Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared Adversarial Examples