Stronger Data Poisoning Attacks Break Data Sanitization Defenses

Pang Wei Koh,Jacob Steinhardt,Percy Liang
DOI: https://doi.org/10.48550/arXiv.1811.00741
2021-12-03
Abstract:Machine learning models trained on data from the outside world can be corrupted by data poisoning attacks that inject malicious points into the models' training sets. A common defense against these attacks is data sanitization: first filter out anomalous training points before training the model. In this paper, we develop three attacks that can bypass a broad range of common data sanitization defenses, including anomaly detectors based on nearest neighbors, training loss, and singular-value decomposition. By adding just 3% poisoned data, our attacks successfully increase test error on the Enron spam detection dataset from 3% to 24% and on the IMDB sentiment classification dataset from 12% to 29%. In contrast, existing attacks which do not explicitly account for these data sanitization defenses are defeated by them. Our attacks are based on two ideas: (i) we coordinate our attacks to place poisoned points near one another, and (ii) we formulate each attack as a constrained optimization problem, with constraints designed to ensure that the poisoned points evade detection. As this optimization involves solving an expensive bilevel problem, our three attacks correspond to different ways of approximating this problem, based on influence functions; minimax duality; and the Karush-Kuhn-Tucker (KKT) conditions. Our results underscore the need to develop more robust defenses against data poisoning attacks.
Machine Learning,Cryptography and Security
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to defend against data poisoning attacks in machine - learning models. Specifically, when machine - learning models are trained on data from the outside world, they may be affected by data poisoning attacks, which undermine the models by injecting malicious data points into the training set. A common defense method is data sanitization, that is, filtering out abnormal training points before training the model. However, existing data poisoning attacks often do not take these data - sanitization defense mechanisms into account and are thus easily detected and defended against. The main contributions of the paper lie in developing three new attack methods that can bypass widely - used data - sanitization defenses, including anomaly detectors based on nearest neighbors, training loss, and singular - value decomposition. By adding only 3% of poisoned data, these attacks successfully increased the test error rate of the Enron spam - detection dataset from 3% to 24% and that of the IMDB sentiment - classification dataset from 12% to 29%. This shows that in order to effectively defend against data poisoning attacks, more robust defense mechanisms need to be developed. ### Specific problems solved by the paper: 1. **Bypassing data - sanitization defenses**: The methods proposed in the paper can bypass a variety of commonly - used data - sanitization defense techniques, including but not limited to anomaly detectors based on nearest neighbors, training loss, and singular - value decomposition. 2. **Optimizing attack strategies**: The paper proposes two strategies to bypass data - sanitization defenses: - **Concentrated attack**: By concentrating the poisoned data in a few positions, these data can appear non - abnormal and thus avoid being detected. - **Constrained optimization**: Model the attack problem as a constrained - optimization problem, with the goal of maximizing the model's loss on the test set while ensuring that the poisoned data can avoid the defense mechanism. 3. **Handling integer constraints**: For cases where the input features are restricted to non - negative integers (such as the bag - of - words model in natural - language tasks), the paper proposes a stochastic - rounding method to handle this constraint. ### Main conclusions: - Poisoned data does not need to appear abnormal. As long as they are carefully coordinated, each poisoned point can seem normal. - Poisoned points do not necessarily have a high loss in the poisoned model, so defenders cannot simply discard points with a high loss. - Regularization can reduce the influence of a single data point on the model, but over - regularization will make defenders more vulnerable to attacks. Through these studies, the paper emphasizes that when designing defense mechanisms, it is necessary to consider the targeted strategies that attackers may adopt, rather than just testing against basic attacks. This provides an important direction for future research, that is, to develop more robust defense mechanisms to deal with complex attacks.