Backdoor Learning Curves: Explaining Backdoor Poisoning Beyond Influence Functions

Antonio Emanuele CinĂ ,Kathrin Grosse,Sebastiano Vascon,Ambra Demontis,Battista Biggio,Fabio Roli,Marcello Pelillo
DOI: https://doi.org/10.1007/s13042-024-02363-5
2024-12-16
Abstract:Backdoor attacks inject poisoning samples during training, with the goal of forcing a machine learning model to output an attacker-chosen class when presented a specific trigger at test time. Although backdoor attacks have been demonstrated in a variety of settings and against different models, the factors affecting their effectiveness are still not well understood. In this work, we provide a unifying framework to study the process of backdoor learning under the lens of incremental learning and influence functions. We show that the effectiveness of backdoor attacks depends on: (i) the complexity of the learning algorithm, controlled by its hyperparameters; (ii) the fraction of backdoor samples injected into the training set; and (iii) the size and visibility of the backdoor trigger. These factors affect how fast a model learns to correlate the presence of the backdoor trigger with the target class. Our analysis unveils the intriguing existence of a region in the hyperparameter space in which the accuracy on clean test samples is still high while backdoor attacks are ineffective, thereby suggesting novel criteria to improve existing defenses.
Machine Learning,Cryptography and Security
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to understand and explain the effectiveness of backdoor attacks in machine - learning models, especially the factors behind the success of these attacks. Specifically, the authors hope to study the backdoor learning process by introducing a new framework, thereby identifying the main factors that affect the vulnerability of machine - learning models to backdoor attacks. The following are the specific problems that this paper attempts to solve: 1. **Effectiveness of backdoor attacks**: - Backdoor attacks inject poisoning samples during the training process, causing the model to output the category specified by the attacker when it encounters a specific trigger during testing. Although this attack has been verified in various environments, the key factors of its effectiveness have not been fully understood. 2. **Main factors affecting the success of backdoor attacks**: - The authors propose a unified framework to study the backdoor learning process from the perspectives of incremental learning and influence functions. They find that the following three factors significantly affect the success of backdoor attacks: - **Complexity of the learning algorithm**: Controlled by hyper - parameters. - **Proportion of backdoor samples injected into the training set**. - **Size and visibility of the backdoor trigger**. 3. **Identifying model configurations with high accuracy and resistance to backdoor attacks**: - The authors find that there is a region in the hyper - parameter space where the accuracy of the model on clean test samples remains high and it has strong resistance to backdoor attacks. This provides new ideas for improving existing defense mechanisms. ### Specific contributions To achieve the above goals, the paper makes the following specific contributions: - **Introducing Backdoor Learning Curves**: As a powerful tool for comprehensively characterizing the backdoor learning process. - **Defining Backdoor Learning Slope**: To quantify the speed at which the classifier learns the backdoor. - **Identifying three important factors affecting the success of backdoor attacks**: Namely, the complexity of the learning algorithm, the proportion of injected backdoor samples, and the size and visibility of the trigger. - **Revealing the robust region in the hyper - parameter space**: In this region, the classifier maintains high accuracy on clean samples while having strong resistance to backdoor attacks, supporting the development of new defense strategies. ### Experimental results Through experimental analysis, the authors verify the influence of these factors on backdoor learning and show how to improve the robustness of the model against backdoor attacks by selecting appropriate hyper - parameters. For example, highly regularized classifiers show higher robustness in the face of backdoor attacks, while larger trigger sizes and higher visibility will accelerate the backdoor learning process but are also more easily detected. In conclusion, through in - depth research on backdoor attacks, this paper not only reveals the key factors affecting their success but also provides theoretical basis and technical means for developing more effective defense mechanisms.