Abstract:Adversarial data poisoning is an effective attack against machine learning and threatens model integrity by introducing poisoned data into the training dataset. So far, it has been studied mostly for classification, even though regression learning is used in many mission critical systems (such as dosage of medication, control of cyber-physical systems and managing power supply). Therefore, in the present research, we aim to evaluate all aspects of data poisoning attacks on regression learning, exceeding previous work both in terms of breadth and depth. We present realistic scenarios in which data poisoning attacks threaten production systems and introduce a novel black-box attack, which is then applied to a real-word medical use-case. As a result, we observe that the mean squared error (MSE) of the regressor increases to 150 percent due to inserting only two percent of poison samples. Finally, we present a new defense strategy against the novel and previous attacks and evaluate it thoroughly on 26 datasets. As a result of the conducted experiments, we conclude that the proposed defence strategy effectively mitigates the considered attacks.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to conduct in - depth research on data poisoning attacks in regression learning and their corresponding defense strategies. Specifically: 1. **Evaluating the impact of data poisoning attacks**: Through studying the medical scenario of Warfarin dose prediction, the paper shows that even a small amount of data poisoning (for example, 2% poisoned samples) can significantly increase the mean - squared error (MSE) of the regression model, thus affecting the performance of the model. Experimental results show that after inserting 2% poisoned samples, the MSE increases by 150%. 2. **Proposing a new black - box attack method**: The paper introduces a new black - box attack method - Flip attack, which can effectively perform data poisoning on non - linear regression models (such as neural networks, kernel - supported vector regression, and kernel regression). Compared with previous methods, the Flip attack can generate effective poisoned samples without knowing the internal structure of the model. 3. **Improving defense strategies**: The paper proposes an iterative pruning defense strategy (iTrim), which can more effectively identify and remove poisoned samples by iteratively searching for the best estimated poisoning proportion \(\hat{\epsilon}\). Experimental results show that iTrim outperforms existing defense methods in a series of experiments on 26 datasets. ### Main contributions - **Demonstrating the harm of data poisoning attacks in regression learning**: Through specific medical cases, it is proved that data poisoning attacks have a serious impact on regression models. - **Proposing a new black - box attack method**: The Flip attack is applicable not only to linear models but also can effectively attack non - linear models. - **Improving defense strategies**: iTrim improves the robustness and effectiveness of defense by iteratively searching for the best \(\hat{\epsilon}\). ### Experimental verification The paper conducted extensive experiments on 26 datasets to verify the effectiveness of the Flip attack and the superiority of the iTrim defense strategy. Experimental results show that: - **The attack effect is significant**: Even in the case of 2% poisoned samples, the performance of the model drops significantly. - **The defense effect is excellent**: iTrim can effectively identify and remove poisoned samples and restore the performance of the model. In general, this paper fills the gap in the research on data poisoning attacks and defenses in regression learning, provides new attack methods and improved defense strategies, and is of great significance for ensuring the security of machine - learning systems.

Data Poisoning Attacks on Regression Learning and Corresponding Defenses

Data Poisoning Attacks in Internet-of-Vehicle Networks: Taxonomy, State-of-The-Art, and Future Directions.

Defending Against Adversarial Denial-of-Service Data Poisoning Attacks

Certified Defenses for Data Poisoning Attacks

Stronger Data Poisoning Attacks Break Data Sanitization Defenses

With Great Dispersion Comes Greater Resilience: Efficient Poisoning Attacks and Defenses for Linear Regression Models

Analysis on Data Poisoning Attack Detection Using Machine Learning Techniques and Artificial Intelligence

Reinforcement Learning For Data Poisoning on Graph Neural Networks

Autoregressive Perturbations for Data Poisoning

Pick your Poison: Undetectability versus Robustness in Data Poisoning Attacks

Have You Poisoned My Data? Defending Neural Networks against Data Poisoning

Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching

Lethal Dose Conjecture on Data Poisoning

Data Poisoning against Differentially-Private Learners: Attacks and Defenses

De-Pois: An Attack-Agnostic Defense against Data Poisoning Attacks

Indiscriminate Data Poisoning Attacks on Neural Networks

Poisoning Web-Scale Training Datasets is Practical

Strong Data Augmentation Sanitizes Poisoning and Backdoor Attacks Without an Accuracy Tradeoff

Exploring the Limits of Model-Targeted Indiscriminate Data Poisoning Attacks

Certified Robustness to Data Poisoning in Gradient-Based Training

Poison Forensics: Traceback of Data Poisoning Attacks in Neural Networks