Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses

Micah Goldblum,Dimitris Tsipras,Chulin Xie,Xinyun Chen,Avi Schwarzschild,Dawn Song,Aleksander Mądry,Bo Li,Tom Goldstein,Aleksander Madry
DOI: https://doi.org/10.1109/tpami.2022.3162397
IF: 23.6
2022-01-01
IEEE Transactions on Pattern Analysis and Machine Intelligence
Abstract:As machine learning systems grow in scale, so do their training data requirements, forcing practitioners to automate and outsource the curation of training data in order to achieve state-of-the-art performance. The absence of trustworthy human supervision over the data collection process exposes organizations to security vulnerabilities; training data can be manipulated to control and degrade the downstream behaviors of learned models. The goal of this work is to systematically categorize and discuss a wide range of dataset vulnerabilities and exploits, approaches for defending against these threats, and an array of open problems in this space.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?