The Trade-Offs of Private Prediction

Laurens van der Maaten,Awni Hannun
DOI: https://doi.org/10.48550/arXiv.2007.05089
2020-07-10
Abstract:Machine learning models leak information about their training data every time they reveal a prediction. This is problematic when the training data needs to remain private. Private prediction methods limit how much information about the training data is leaked by each prediction. Private prediction can also be achieved using models that are trained by private training methods. In private prediction, both private training and private prediction methods exhibit trade-offs between privacy, privacy failure probability, amount of training data, and inference budget. Although these trade-offs are theoretically well-understood, they have hardly been studied empirically. This paper presents the first empirical study into the trade-offs of private prediction. Our study sheds light on which methods are best suited for which learning setting. Perhaps surprisingly, we find private training methods outperform private prediction methods in a wide range of private prediction settings.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in machine - learning models, how to limit the amount of training - data information leaked during the model prediction process while ensuring prediction accuracy. Specifically, when the training data of a machine - learning model needs to be kept private, how to ensure that each prediction does not leak too much information about the training data. ### Problem Background Every time a machine - learning model makes a prediction, it leaks some information about its training data. This is a serious problem when the training data needs to be kept confidential. For example, a hotel recommendation system may be trained based on users' booking data, and this data should not be accessed by other users. If not handled properly, users may infer other users' booking information through the system's recommendation results, thus leading to privacy leakage. ### Research Objectives The objective of this paper is to research and compare different private prediction methods and private training methods to understand their performance in different settings, especially the trade - off between privacy protection and prediction accuracy. ### Main Contributions 1. **First Empirical Study**: This paper conducts the first empirical study on private prediction methods, exploring the trade - offs among privacy, accuracy, probability of privacy failure, amount of training data, and inference budget for different methods. 2. **Method Comparison**: The research finds that in many practical learning scenarios, private training methods are superior to private prediction methods in terms of the privacy - accuracy trade - off. 3. **Guiding Practice**: It provides guidance for practitioners to choose private prediction methods suitable for specific learning environments. ### Specific Problem Description Consider a private machine - learning model \(\phi(x; \theta)\), where \(\theta\) is the model parameter. Given a \(D\)-dimensional input vector \(x\in\mathbb{R}^D\), the model outputs a probability vector \(y\in\Delta^C\) of \(C\) classes. The model parameter \(\theta\) is obtained by fitting a training set \(D =\{(x_1, y_1),\dots,(x_N, y_N)\}\) containing \(N\) labeled samples, and this data needs to be kept private. The model owner provides a service, receives an input \(\hat{x}\), calculates the prediction \(\hat{y}=\phi(\hat{x}; \theta)\), and publicly releases the prediction result. From the perspective of the model owner, there is a risk: a person who observes (\(\hat{x},\hat{y}\)) may obtain information about the private training set \(D\) by querying the model. Therefore, the model owner hopes to limit the amount of information leaked through \(B\) queries \(Q =\{\hat{x}_1,\dots,\hat{x}_B\}\) of the inference budget. ### Solutions To limit the amount of leaked information, two main methods can be adopted: 1. **Private Training**: By performing differentially private training on the model, ensure that the model parameter \(\theta\) does not leak too much information about the training set \(D\). 2. **Private Prediction**: By perturbing the prediction results of non - private models, limit the information about the training set \(D\) contained in the prediction results. Through empirical research, this paper compares the performance of these two methods in different settings and provides suggestions for practitioners to choose appropriate methods.