Differentially Private Regression and Classification with Sparse Gaussian Processes

Michael Thomas Smith,Mauricio A. Alvarez,Neil D. Lawrence
DOI: https://doi.org/10.48550/arXiv.1909.09147
2019-09-19
Abstract:A continuing challenge for machine learning is providing methods to perform computation on data while ensuring the data remains private. In this paper we build on the provable privacy guarantees of differential privacy which has been combined with Gaussian processes through the previously published \emph{cloaking method}. In this paper we solve several shortcomings of this method, starting with the problem of predictions in regions with low data density. We experiment with the use of inducing points to provide a sparse approximation and show that these can provide robust differential privacy in outlier areas and at higher dimensions. We then look at classification, and modify the Laplace approximation approach to provide differentially private predictions. We then combine this with the sparse approximation and demonstrate the capability to perform classification in high dimensions. We finally explore the issue of hyperparameter selection and develop a method for their private selection. This paper and associated libraries provide a robust toolkit for combining differential privacy and GPs in a practical manner.
Machine Learning,Cryptography and Security
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve a persistent challenge in machine learning, that is, performing computations while ensuring data privacy. Specifically, the author focuses on how to combine Differential Privacy (DP) with Gaussian Processes (GPs) to achieve the following goals: 1. **Prediction problems in low - density regions**: - When making predictions in low - data - density regions, existing methods are vulnerable to outliers, and the noise added to ensure differential privacy is too large, resulting in inaccurate prediction results. - To this end, the author introduces inducing points to provide sparse approximations, thereby providing robust differential privacy in outlier regions and high - dimensional spaces. 2. **Classification problems**: - Differential privacy methods were initially only applicable to regression problems (Gaussian likelihood) and not to classification problems. - The author modifies the Laplace approximation method to achieve classification prediction under differential privacy and combines it with sparse approximation, enabling the algorithm to perform classification tasks in high - dimensional spaces. 3. **Hyperparameter selection problems**: - Although hyperparameter optimization is briefly discussed in the supplementary materials, this issue has not been fully addressed in previous literature. - The author develops a method to select hyperparameters, ensuring that the selection of these hyperparameters is also differentially private. ### Formula presentation - **Definition of differential privacy**: \[ P(R(D)\in m)\leq e^{\varepsilon}P(R(D')\in m)+\delta \] where \(D\) and \(D'\) are adjacent databases (differing by only one row or one individual's data), \(\varepsilon\) controls the degree of privacy loss, and \(\delta\) represents the probability that the inequality holds is \(1 - \delta\). - **Cloaking matrix**: \[ C = K_{*f}K^{-1} \] where \(K_{*f}\) is the covariance matrix between test points and training points, and \(K\) is the covariance matrix between training points. - **Laplace approximation update formula**: \[ \hat{f}_{\text{new}}=(K^{-1}+W)^{-1}(W\hat{f}+\nabla\log p(y|\hat{f})) \] where \(W = -\nabla\nabla\log p(y|f)\), and for the Logistic link function, the elements of \(W\) are \(-\pi_i(1-\pi_i)\), where \(\pi_i = p(y_i = 1|f_i)=(1 + e^{-f_i})^{-1}\). Through these improvements, the paper provides a robust toolkit for combining differential privacy and Gaussian processes in practical applications, addressing the shortcomings of existing methods in outlier sensitivity and scope of application.