Abstract:Machine learning models trained on data from the outside world can be corrupted by data poisoning attacks that inject malicious points into the models' training sets. A common defense against these attacks is data sanitization: first filter out anomalous training points before training the model. In this paper, we develop three attacks that can bypass a broad range of common data sanitization defenses, including anomaly detectors based on nearest neighbors, training loss, and singular-value decomposition. By adding just 3% poisoned data, our attacks successfully increase test error on the Enron spam detection dataset from 3% to 24% and on the IMDB sentiment classification dataset from 12% to 29%. In contrast, existing attacks which do not explicitly account for these data sanitization defenses are defeated by them. Our attacks are based on two ideas: (i) we coordinate our attacks to place poisoned points near one another, and (ii) we formulate each attack as a constrained optimization problem, with constraints designed to ensure that the poisoned points evade detection. As this optimization involves solving an expensive bilevel problem, our three attacks correspond to different ways of approximating this problem, based on influence functions; minimax duality; and the Karush-Kuhn-Tucker (KKT) conditions. Our results underscore the need to develop more robust defenses against data poisoning attacks.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to defend against data poisoning attacks in machine - learning models. Specifically, when machine - learning models are trained on data from the outside world, they may be affected by data poisoning attacks, which undermine the models by injecting malicious data points into the training set. A common defense method is data sanitization, that is, filtering out abnormal training points before training the model. However, existing data poisoning attacks often do not take these data - sanitization defense mechanisms into account and are thus easily detected and defended against. The main contributions of the paper lie in developing three new attack methods that can bypass widely - used data - sanitization defenses, including anomaly detectors based on nearest neighbors, training loss, and singular - value decomposition. By adding only 3% of poisoned data, these attacks successfully increased the test error rate of the Enron spam - detection dataset from 3% to 24% and that of the IMDB sentiment - classification dataset from 12% to 29%. This shows that in order to effectively defend against data poisoning attacks, more robust defense mechanisms need to be developed. ### Specific problems solved by the paper: 1. **Bypassing data - sanitization defenses**: The methods proposed in the paper can bypass a variety of commonly - used data - sanitization defense techniques, including but not limited to anomaly detectors based on nearest neighbors, training loss, and singular - value decomposition. 2. **Optimizing attack strategies**: The paper proposes two strategies to bypass data - sanitization defenses: - **Concentrated attack**: By concentrating the poisoned data in a few positions, these data can appear non - abnormal and thus avoid being detected. - **Constrained optimization**: Model the attack problem as a constrained - optimization problem, with the goal of maximizing the model's loss on the test set while ensuring that the poisoned data can avoid the defense mechanism. 3. **Handling integer constraints**: For cases where the input features are restricted to non - negative integers (such as the bag - of - words model in natural - language tasks), the paper proposes a stochastic - rounding method to handle this constraint. ### Main conclusions: - Poisoned data does not need to appear abnormal. As long as they are carefully coordinated, each poisoned point can seem normal. - Poisoned points do not necessarily have a high loss in the poisoned model, so defenders cannot simply discard points with a high loss. - Regularization can reduce the influence of a single data point on the model, but over - regularization will make defenders more vulnerable to attacks. Through these studies, the paper emphasizes that when designing defense mechanisms, it is necessary to consider the targeted strategies that attackers may adopt, rather than just testing against basic attacks. This provides an important direction for future research, that is, to develop more robust defense mechanisms to deal with complex attacks.

Stronger Data Poisoning Attacks Break Data Sanitization Defenses

Certified Defenses for Data Poisoning Attacks

Poisoning Attacks and Data Sanitization Mitigations for Machine Learning Models in Network Intrusion Detection Systems

Strong Data Augmentation Sanitizes Poisoning and Backdoor Attacks Without an Accuracy Tradeoff

Data Poisoning Attacks on Regression Learning and Corresponding Defenses

Poisoning Attacks on Machine Learning Models in Cyber Systems and Mitigation Strategies

Indiscriminate Data Poisoning Attacks on Neural Networks

Exploring the Limits of Model-Targeted Indiscriminate Data Poisoning Attacks

How to Sift Out a Clean Data Subset in the Presence of Data Poisoning?

Pick your Poison: Undetectability versus Robustness in Data Poisoning Attacks

Concealed Data Poisoning Attacks on NLP Models

Amplifying Membership Exposure via Data Poisoning

Data Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses

A Flexible Poisoning Attack Against Machine Learning.

MetaPoison: Practical General-purpose Clean-label Data Poisoning

With Great Dispersion Comes Greater Resilience: Efficient Poisoning Attacks and Defenses for Linear Regression Models

Data Poisoning in LLMs: Jailbreak-Tuning and Scaling Laws

Lethal Dose Conjecture on Data Poisoning

DP2Dataset Protection by Data Poisoning

Accumulative Poisoning Attacks on Real-time Data

Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching