Abstract:Recent research findings suggest that machine learning models are highly susceptible to backdoor poisoning attacks. Backdoor poisoning attacks can be easily executed and achieve high success rates, as the model exhibits anomalous behavior even if a small quantity of malicious data is incorporated into the training dataset. In conventional backdoor defense technologies, fine-tuning is employed as an invasive method that involves adjusting the parameters of model neurons to eliminate backdoors in the attacked model. Nevertheless, this method poses a challenge as the same neurons are responsible for both the original and backdoor tasks, resulting in a decline in the accuracy of the original task during the fine-tuning process. In order to address this issue, we propose a non-invasive approach known as Anti-Backdoor Model (ABM), which does not involve modifying the parameters of the attacked model. ABM employs an external model to counteract the influence of the backdoor task on the attacked model, thereby achieving a balance between eliminating backdoors and preserving the accuracy of the original task. Specifically, our approach involves initially embedding a controllable backdoor in the dataset and leveraging the strong and weak relationships between backdoors to identify a highly concentrated poisoned dataset. Subsequently, we employ the standard training method to train the attacked model (the teacher model). Finally, we utilize this dataset with low volume to train an external model (the student model) that exclusively focuses on backdoors by means of knowledge distillation to counteract the backdoor task in the attacked model (the teacher model). In the experimental part, we assess the effectiveness of ABM by testing eight mainstream attacks on three standard public datasets. Experimental results reveal that ABM exhibits promising efficacy in eliminating the backdoor task while preserving the accuracy of the original task.

Mitigating Backdoor Attacks using Activation-Guided Model Editing

B3: Backdoor Attacks Against Black-box Machine Learning Models

Backdoor Attacks via Machine Unlearning

Exploiting Machine Unlearning for Backdoor Attacks in Deep Learning System

Backdoor Defense with Machine Unlearning

Unveiling and Mitigating Backdoor Vulnerabilities based on Unlearning Weight Changes and Backdoor Activeness

Unified Neural Backdoor Removal with Only Few Clean Samples through Unlearning and Relearning

Unlearning Backdoor Attacks through Gradient-Based Model Pruning

Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats

Anti-Backdoor Model: A Novel Algorithm to Remove Backdoors in a Non-invasive Way

Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared Adversarial Examples

Unlearn to Relearn Backdoors: Deferred Backdoor Functionality Attacks on Deep Learning Models

Backdoor in Seconds: Unlocking Vulnerabilities in Large Pre-trained Models via Model Editing

Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning

Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor

Backdoor Mitigation by Distance-Driven Detoxification

Anti-Backdoor Learning: Training Clean Models on Poisoned Data

Model-agnostic clean-label backdoor mitigation in cybersecurity environments

"No Matter What You Do": Purifying GNN Models via Backdoor Unlearning

DLP: towards active defense against backdoor attacks with decoupled learning process

A Backdoor Attack Scheme with Invisible Triggers Based on Model Architecture Modification