Abstract:Backdoor attacks aim to inject backdoors to victim machine learning models during training time, such that the backdoored model maintains the prediction power of the original model towards clean inputs and misbehaves towards backdoored inputs with the trigger. The reason for backdoor attacks is that resource-limited users usually download sophisticated models from model zoos or query the models from MLaaS rather than training a model from scratch, thus a malicious third party has a chance to provide a backdoored model. In general, the more precious the model provided (i.e., models trained on rare datasets), the more popular it is with users. In this article, from a malicious model provider perspective, we propose a black-box backdoor attack, named B 3 , where neither the rare victim model (including the model architecture, parameters, and hyperparameters) nor the training data is available to the adversary. To facilitate backdoor attacks in the black-box scenario, we design a cost-effective model extraction method that leverages a carefully constructed query dataset to steal the functionality of the victim model with a limited budget. As the trigger is key to successful backdoor attacks, we develop a novel trigger generation algorithm that intensifies the bond between the trigger and the targeted misclassification label through the neuron with the highest impact on the targeted label. Extensive experiments have been conducted on various simulated deep learning models and the commercial API of Alibaba Cloud Compute Service. We demonstrate that B 3 has a high attack success rate and maintains high prediction accuracy for benign inputs. It is also shown that B 3 is robust against state-of-the-art defense strategies against backdoor attacks, such as model pruning and NC.

Backdoor Attacks via Machine Unlearning

B3: Backdoor Attacks Against Black-box Machine Learning Models

Exploiting Machine Unlearning for Backdoor Attacks in Deep Learning System

Releasing Malevolence from Benevolence: The Menace of Benign Data on Machine Unlearning

Mitigating Backdoor Attacks using Activation-Guided Model Editing

Unlearn and Burn: Adversarial Machine Unlearning Requests Destroy Model Accuracy

Learn What You Want to Unlearn: Unlearning Inversion Attacks against Machine Unlearning

Adversarial Machine Unlearning

Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats

Unlearning Backdoor Attacks through Gradient-Based Model Pruning

Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning

Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared Adversarial Examples

Backdoor Learning: A Survey.

Anti-Backdoor Learning: Training Clean Models on Poisoned Data

Ensuring User Privacy and Model Security via Machine Unlearning: A Review

Unveiling and Mitigating Backdoor Vulnerabilities based on Unlearning Weight Changes and Backdoor Activeness

Zero-shot Class Unlearning via Layer-wise Relevance Analysis and Neuronal Path Perturbation

Machine Unlearning Fails to Remove Data Poisoning Attacks

Evaluating of Machine Unlearning: Robustness Verification Without Prior Modifications

Reconstruction Attacks on Machine Unlearning: Simple Models are Vulnerable

Unified Neural Backdoor Removal with Only Few Clean Samples through Unlearning and Relearning