Abstract:Deep neural networks have been widely used in many critical applications, such as autonomous vehicles and medical diagnosis. However, their security is threatened by backdoor attacks, which are achieved by adding artificial patterns to specific training data. Existing defense strategies primarily focus on using reverse engineering to reproduce the backdoor trigger generated by attackers and subsequently repair the DNN model by adding the trigger into inputs and fine-tuning the model with ground-truth labels. However, once the trigger generated by the attackers is complex and invisible, the defender cannot reproduce the trigger successfully then the DNN model will not be repaired, as the trigger is not effectively removed. In this work, we propose Adversarial Feature Map Pruning for Backdoor (FMP) to mitigate backdoor from the DNN. Unlike existing defense strategies, which focus on reproducing backdoor triggers, FMP attempts to prune backdoor feature maps, which are trained to extract backdoor information from inputs. After pruning these backdoor feature maps, FMP will fine-tune the model with a secure subset of training data. Our experiments demonstrate that, compared to existing defense strategies, FMP can effectively reduce the Attack Success Rate (ASR) even against the most complex and invisible attack triggers (e.g., FMP decreases the ASR to 2.86\% in CIFAR10, which is 19.2\% to 65.41\% lower than baselines). Second, unlike conventional defense methods that tend to exhibit low robust accuracy (that is, the accuracy of the model on poisoned data), FMP achieves a higher RA, indicating its superiority in maintaining model performance while mitigating the effects of backdoor attacks (e.g., FMP obtains 87.40\% RA in CIFAR10). Our code is publicly available at: <a class="link-external link-https" href="https://github.com/retsuh-bqw/FMP" rel="external noopener nofollow">this https URL</a>.

Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense

Redeem Myself: Purifying Backdoors in Deep Learning Models Using Self Attention Distillation.

Unveiling and Mitigating Backdoor Vulnerabilities based on Unlearning Weight Changes and Backdoor Activeness

Adversarial Neuron Pruning Purifies Backdoored Deep Models

PAD-FT: A Lightweight Defense for Backdoor Attacks via Data Purification and Fine-Tuning

Adversarial Neuron Pruning Purifies Backdoored Deep Models.

Purifier: Plug-and-play Backdoor Mitigation for Pre-trained Models Via Anomaly Activation Suppression

Towards Stable Backdoor Purification through Feature Shift Tuning

Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared Adversarial Examples

From Toxic to Trustworthy: Using Self-Distillation and Semi-supervised Methods to Refine Neural Networks

Adversarial Feature Map Pruning for Backdoor

Backdoor Defense via Decoupling the Training Process

Towards A Critical Evaluation of Robustness for Deep Learning Backdoor Countermeasures

Towards Unified Robustness Against Both Backdoor and Adversarial Attacks

Fusing Pruned and Backdoored Models: Optimal Transport-based Data-free Backdoor Mitigation

Breaking the False Sense of Security in Backdoor Defense through Re-Activation Attack

Augmented Neural Fine-Tuning for Efficient Backdoor Purification

Reconstructive Neuron Pruning for Backdoor Defense

BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection

Toward a Critical Evaluation of Robustness for Deep Learning Backdoor Countermeasures

Beating Backdoor Attack at Its Own Game