Abstract:Backdoor attacks aim to inject backdoors to victim machine learning models during training time, such that the backdoored model maintains the prediction power of the original model towards clean inputs and misbehaves towards backdoored inputs with the trigger. The reason for backdoor attacks is that resource-limited users usually download sophisticated models from model zoos or query the models from MLaaS rather than training a model from scratch, thus a malicious third party has a chance to provide a backdoored model. In general, the more precious the model provided (i.e., models trained on rare datasets), the more popular it is with users. In this article, from a malicious model provider perspective, we propose a black-box backdoor attack, named B 3 , where neither the rare victim model (including the model architecture, parameters, and hyperparameters) nor the training data is available to the adversary. To facilitate backdoor attacks in the black-box scenario, we design a cost-effective model extraction method that leverages a carefully constructed query dataset to steal the functionality of the victim model with a limited budget. As the trigger is key to successful backdoor attacks, we develop a novel trigger generation algorithm that intensifies the bond between the trigger and the targeted misclassification label through the neuron with the highest impact on the targeted label. Extensive experiments have been conducted on various simulated deep learning models and the commercial API of Alibaba Cloud Compute Service. We demonstrate that B 3 has a high attack success rate and maintains high prediction accuracy for benign inputs. It is also shown that B 3 is robust against state-of-the-art defense strategies against backdoor attacks, such as model pruning and NC.

Backdoor Learning: A Survey.

B3: Backdoor Attacks Against Black-box Machine Learning Models

Backdoor Attacks and Defenses in Federated Learning: State-of-the-Art, Taxonomy, and Future Directions

Survey on Backdoor Attacks and Countermeasures in Deep Neural Network

Backdoor Attacks to Deep Neural Networks: A Survey of the Literature, Challenges, and Future Research Directions

BackdoorBench: A Comprehensive Benchmark and Analysis of Backdoor Learning

Anti-Backdoor Learning: Training Clean Models on Poisoned Data

Backdoor Attacks and Countermeasures on Deep Learning: A Comprehensive Review

Beating Backdoor Attack at Its Own Game

Backdoor Defense via Decoupling the Training Process

DLP: towards active defense against backdoor attacks with decoupled learning process

Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks

Backdoor Vulnerabilities in Normally Trained Deep Learning Models

Untargeted Backdoor Attack Against Object Detection

Backdoor Defense via Adaptively Splitting Poisoned Dataset

AdvDoor: Adversarial Backdoor Attack of Deep Learning System

Black-box Detection of Backdoor Attacks with Limited Information and Data

Backdoor Attack in the Physical World

Escaping Backdoor Attack Detection of Deep Learning

Unveiling and Mitigating Backdoor Vulnerabilities based on Unlearning Weight Changes and Backdoor Activeness