Privacy and machine learning: two unexpected allies?

Nicolas Papernot, Ian Goodfellow
2018-01-01
Abstract:In many applications of machine learning, such as machine learning for medical diagnosis, we would like to have machine learning algorithms that do not memorize sensitive information about the training set, such as the specific medical histories of individual patients. Differential privacy is a framework for measuring the privacy guarantees provided by an algorithm. Through the lens of differential privacy, we can design machine learning algorithms that responsibly train models on private data. Our works (with Martín Abadi, Úlfar Erlingsson, Ilya Mironov, Ananth Raghunathan, Shuang Song and Kunal Talwar) on differential privacy for machine learning have made it very easy for machine learning researchers to contribute to privacy research—even without being an expert on the mathematics of differential privacy. In this blog post, we’ll show you how to do it.The key is a family of algorithms called Private Aggregation of Teacher Ensembles (PATE). One of the great things about the PATE framework, besides its name, is that anyone who knows how to train a supervised ML model (such as a neural net) can now contribute to research on differential privacy for machine learning. The PATE framework achieves private learning by carefully coordinating the activity of several different ML models. As long as you follow the procedure specified by the PATE framework, the overall resulting model will have measurable privacy guarantees. Each of the individual ML models is trained with ordinary supervised learning techniques, which many of our readers are probably familiar with from hacking on ImageNet classification or many other more traditional ML …
What problem does this paper attempt to address?