Abstract:With the rise of third parties in the machine learning pipeline, the service provider in "Machine Learning as a Service" (MLaaS), or external data contributors in online learning, or the retraining of existing models, the need to ensure the security of the resulting machine learning models has become an increasingly important topic. The security community has demonstrated that without transparency of the data and the resulting model, there exist many potential security risks, with new risks constantly being discovered. In this paper, we focus on one of these security risks -- poisoning attacks. Specifically, we analyze how attackers may interfere with the results of regression learning by poisoning the training datasets. To this end, we analyze and develop a new poisoning attack algorithm. Our attack, termed Nopt, in contrast with previous poisoning attack algorithms, can produce larger errors with the same proportion of poisoning data-points. Furthermore, we also significantly improve the state-of-the-art defense algorithm, termed TRIM, proposed by Jagielsk et al. (IEEE S&P 2018), by incorporating the concept of probability estimation of clean data-points into the algorithm. Our new defense algorithm, termed Proda, demonstrates an increased effectiveness in reducing errors arising from the poisoning dataset through optimizing ensemble models. We highlight that the time complexity of TRIM had not been estimated; however, we deduce from their work that TRIM can take exponential time complexity in the worst-case scenario, in excess of Proda's logarithmic time. The performance of both our proposed attack and defense algorithms is extensively evaluated on four real-world datasets of housing prices, loans, health care, and bike sharing services. We hope that our work will inspire future research to develop more robust learning algorithms immune to poisoning attacks.

Machine Learning with Electronic Health Records is vulnerable to Backdoor Trigger Attacks

B3: Backdoor Attacks Against Black-box Machine Learning Models

Adversarial Attacks to Machine Learning-Based Smart Healthcare Systems

BadCLM: Backdoor Attack in Clinical Language Models for Electronic Health Records

MedAttacker: Exploring Black-Box Adversarial Attacks on Risk Prediction Models in Healthcare

Longitudinal Adversarial Attack on Electronic Health Records Data

Systematic Evaluation of Backdoor Data Poisoning Attacks on Image Classifiers

Addressing Adversarial Machine Learning Attacks in Smart Healthcare Perspectives

Exploiting Machine Unlearning for Backdoor Attacks in Deep Learning System

Backdoor Attacks via Machine Unlearning

Systematically Assessing the Security Risks of AI/ML-enabled Connected Healthcare Systems

Demystifying Poisoning Backdoor Attacks from a Statistical Perspective

Hiding Backdoors within Event Sequence Data via Poisoning Attacks

The Dependence of Machine Learning on Electronic Medical Record Quality

Model Agnostic Defence against Backdoor Attacks in Machine Learning

Analysis on Data Poisoning Attack Detection Using Machine Learning Techniques and Artificial Intelligence

Potential Biases in Machine Learning Algorithms Using Electronic Health Record Data

Remembering Everything Makes You Vulnerable: A Limelight on Machine Unlearning for Personalized Healthcare Sector

Data Poisoning Attacks on Regression Learning and Corresponding Defenses

Exposing Vulnerabilities in Clinical LLMs Through Data Poisoning Attacks: Case Study in Breast Cancer

With Great Dispersion Comes Greater Resilience: Efficient Poisoning Attacks and Defenses for Linear Regression Models