Machine Learning with Electronic Health Records is vulnerable to Backdoor Trigger Attacks

Byunggill Joe,Akshay Mehra,Insik Shin,Jihun Hamm
DOI: https://doi.org/10.48550/arXiv.2106.07925
2021-06-15
Abstract:Electronic Health Records (EHRs) provide a wealth of information for machine learning algorithms to predict the patient outcome from the data including diagnostic information, vital signals, lab tests, drug administration, and demographic information. Machine learning models can be built, for example, to evaluate patients based on their predicted mortality or morbidity and to predict required resources for efficient resource management in hospitals. In this paper, we demonstrate that an attacker can manipulate the machine learning predictions with EHRs easily and selectively at test time by backdoor attacks with the poisoned training data. Furthermore, the poison we create has statistically similar features to the original data making it hard to detect, and can also attack multiple machine learning models without any knowledge of the models. With less than 5% of the raw EHR data poisoned, we achieve average attack success rates of 97% on mortality prediction tasks with MIMIC-III database against Logistic Regression, Multilayer Perceptron, and Long Short-term Memory models simultaneously.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?