Abstract:Backdoor attacks inject poisoning samples during training, with the goal of forcing a machine learning model to output an attacker-chosen class when presented a specific trigger at test time. Although backdoor attacks have been demonstrated in a variety of settings and against different models, the factors affecting their effectiveness are still not well understood. In this work, we provide a unifying framework to study the process of backdoor learning under the lens of incremental learning and influence functions. We show that the effectiveness of backdoor attacks depends on: (i) the complexity of the learning algorithm, controlled by its hyperparameters; (ii) the fraction of backdoor samples injected into the training set; and (iii) the size and visibility of the backdoor trigger. These factors affect how fast a model learns to correlate the presence of the backdoor trigger with the target class. Our analysis unveils the intriguing existence of a region in the hyperparameter space in which the accuracy on clean test samples is still high while backdoor attacks are ineffective, thereby suggesting novel criteria to improve existing defenses.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to understand and explain the effectiveness of backdoor attacks in machine - learning models, especially the factors behind the success of these attacks. Specifically, the authors hope to study the backdoor learning process by introducing a new framework, thereby identifying the main factors that affect the vulnerability of machine - learning models to backdoor attacks. The following are the specific problems that this paper attempts to solve: 1. **Effectiveness of backdoor attacks**: - Backdoor attacks inject poisoning samples during the training process, causing the model to output the category specified by the attacker when it encounters a specific trigger during testing. Although this attack has been verified in various environments, the key factors of its effectiveness have not been fully understood. 2. **Main factors affecting the success of backdoor attacks**: - The authors propose a unified framework to study the backdoor learning process from the perspectives of incremental learning and influence functions. They find that the following three factors significantly affect the success of backdoor attacks: - **Complexity of the learning algorithm**: Controlled by hyper - parameters. - **Proportion of backdoor samples injected into the training set**. - **Size and visibility of the backdoor trigger**. 3. **Identifying model configurations with high accuracy and resistance to backdoor attacks**: - The authors find that there is a region in the hyper - parameter space where the accuracy of the model on clean test samples remains high and it has strong resistance to backdoor attacks. This provides new ideas for improving existing defense mechanisms. ### Specific contributions To achieve the above goals, the paper makes the following specific contributions: - **Introducing Backdoor Learning Curves**: As a powerful tool for comprehensively characterizing the backdoor learning process. - **Defining Backdoor Learning Slope**: To quantify the speed at which the classifier learns the backdoor. - **Identifying three important factors affecting the success of backdoor attacks**: Namely, the complexity of the learning algorithm, the proportion of injected backdoor samples, and the size and visibility of the trigger. - **Revealing the robust region in the hyper - parameter space**: In this region, the classifier maintains high accuracy on clean samples while having strong resistance to backdoor attacks, supporting the development of new defense strategies. ### Experimental results Through experimental analysis, the authors verify the influence of these factors on backdoor learning and show how to improve the robustness of the model against backdoor attacks by selecting appropriate hyper - parameters. For example, highly regularized classifiers show higher robustness in the face of backdoor attacks, while larger trigger sizes and higher visibility will accelerate the backdoor learning process but are also more easily detected. In conclusion, through in - depth research on backdoor attacks, this paper not only reveals the key factors affecting their success but also provides theoretical basis and technical means for developing more effective defense mechanisms.

Backdoor Learning Curves: Explaining Backdoor Poisoning Beyond Influence Functions

Demystifying Poisoning Backdoor Attacks from a Statistical Perspective

Anti-Backdoor Learning: Training Clean Models on Poisoned Data

Circumventing Backdoor Defenses That Are Based on Latent Separability

Bag of tricks for backdoor learning

Backdoor Learning: A Survey.

Systematic Evaluation of Backdoor Data Poisoning Attacks on Image Classifiers

Do Backdoors Assist Membership Inference Attacks?

Boosting Backdoor Attack with A Learnable Poisoning Sample Selection Strategy

Towards A Proactive ML Approach for Detecting Backdoor Poison Samples

Effective Backdoor Defense by Exploiting Sensitivity of Poisoned Samples

DLP: towards active defense against backdoor attacks with decoupled learning process

Beating Backdoor Attack at Its Own Game

Rethinking Backdoor Attacks

Backdoor Attacks Against Incremental Learners: An Empirical Evaluation Study

Reverse Engineering Imperceptible Backdoor Attacks on Deep Neural Networks for Detection and Training Set Cleansing

Exploiting Machine Unlearning for Backdoor Attacks in Deep Learning System

Universal Backdoor Attacks

Enhanced Coalescence Backdoor Attack Against DNN Based on Pixel Gradient

A General Framework for Defending Against Backdoor Attacks via Influence Graph

Parity measurements of nuclear levels using a free-electron-laser generated gamma-ray beam.