Protecting against simultaneous data poisoning attacks

Neel Alex,Shoaib Ahmed Siddiqui,Amartya Sanyal,David Krueger
2024-08-24
Abstract:Current backdoor defense methods are evaluated against a single attack at a time. This is unrealistic, as powerful machine learning systems are trained on large datasets scraped from the internet, which may be attacked multiple times by one or more attackers. We demonstrate that simultaneously executed data poisoning attacks can effectively install multiple backdoors in a single model without substantially degrading clean accuracy. Furthermore, we show that existing backdoor defense methods do not effectively prevent attacks in this setting. Finally, we leverage insights into the nature of backdoor attacks to develop a new defense, BaDLoss, that is effective in the multi-attack setting. With minimal clean accuracy degradation, BaDLoss attains an average attack success rate in the multi-attack setting of 7.98% in CIFAR-10 and 10.29% in GTSRB, compared to the average of other defenses at 64.48% and 84.28% respectively.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively protect machine - learning models in the face of multiple concurrent backdoor attacks. Currently, most defense methods against backdoor attacks are evaluated under the assumption that there is only one attack at a time, which is inconsistent with the actual situation. In fact, large - scale machine - learning systems are usually trained with large - scale data sets crawled from the Internet, and these data sets may be attacked multiple times by one or more attackers. Therefore, the paper points out that in this multi - attack setting, existing defense methods cannot effectively prevent attacks. Specifically, the paper shows that multiple backdoors can be simultaneously installed in a model without significantly reducing the model's accuracy on clean data, and existing defense methods are not effective in this case. To solve this problem, the paper proposes a new defense mechanism - BaDLoss. BaDLoss can identify possible backdoor samples by analyzing the loss dynamics of each sample during the training process, so it can effectively defend against backdoor attacks in a multi - attack setting. Experimental results show that BaDLoss can significantly reduce the success rates of multiple attacks while maintaining high clean - data accuracy, especially on the CIFAR - 10 and GTSRB data sets.