Abstract:A continuing challenge for machine learning is providing methods to perform computation on data while ensuring the data remains private. In this paper we build on the provable privacy guarantees of differential privacy which has been combined with Gaussian processes through the previously published \emph{cloaking method}. In this paper we solve several shortcomings of this method, starting with the problem of predictions in regions with low data density. We experiment with the use of inducing points to provide a sparse approximation and show that these can provide robust differential privacy in outlier areas and at higher dimensions. We then look at classification, and modify the Laplace approximation approach to provide differentially private predictions. We then combine this with the sparse approximation and demonstrate the capability to perform classification in high dimensions. We finally explore the issue of hyperparameter selection and develop a method for their private selection. This paper and associated libraries provide a robust toolkit for combining differential privacy and GPs in a practical manner.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve a persistent challenge in machine learning, that is, performing computations while ensuring data privacy. Specifically, the author focuses on how to combine Differential Privacy (DP) with Gaussian Processes (GPs) to achieve the following goals: 1. **Prediction problems in low - density regions**: - When making predictions in low - data - density regions, existing methods are vulnerable to outliers, and the noise added to ensure differential privacy is too large, resulting in inaccurate prediction results. - To this end, the author introduces inducing points to provide sparse approximations, thereby providing robust differential privacy in outlier regions and high - dimensional spaces. 2. **Classification problems**: - Differential privacy methods were initially only applicable to regression problems (Gaussian likelihood) and not to classification problems. - The author modifies the Laplace approximation method to achieve classification prediction under differential privacy and combines it with sparse approximation, enabling the algorithm to perform classification tasks in high - dimensional spaces. 3. **Hyperparameter selection problems**: - Although hyperparameter optimization is briefly discussed in the supplementary materials, this issue has not been fully addressed in previous literature. - The author develops a method to select hyperparameters, ensuring that the selection of these hyperparameters is also differentially private. ### Formula presentation - **Definition of differential privacy**: \[ P(R(D)\in m)\leq e^{\varepsilon}P(R(D')\in m)+\delta \] where \(D\) and \(D'\) are adjacent databases (differing by only one row or one individual's data), \(\varepsilon\) controls the degree of privacy loss, and \(\delta\) represents the probability that the inequality holds is \(1 - \delta\). - **Cloaking matrix**: \[ C = K_{*f}K^{-1} \] where \(K_{*f}\) is the covariance matrix between test points and training points, and \(K\) is the covariance matrix between training points. - **Laplace approximation update formula**: \[ \hat{f}_{\text{new}}=(K^{-1}+W)^{-1}(W\hat{f}+\nabla\log p(y|\hat{f})) \] where \(W = -\nabla\nabla\log p(y|f)\), and for the Logistic link function, the elements of \(W\) are \(-\pi_i(1-\pi_i)\), where \(\pi_i = p(y_i = 1|f_i)=(1 + e^{-f_i})^{-1}\). Through these improvements, the paper provides a robust toolkit for combining differential privacy and Gaussian processes in practical applications, addressing the shortcomings of existing methods in outlier sensitivity and scope of application.

Differentially Private Regression and Classification with Sparse Gaussian Processes

Differential Privacy With Variant-Noise For Gaussian Processes Classification

Differentially Private Regression with Unbounded Covariates

Differentially Private Variational Inference for Non-conjugate Models

Differential Privacy for Class-based Data: A Practical Gaussian Mechanism

Noise-Aware Differentially Private Regression via Meta-Learning

Certification for Differentially Private Prediction in Gradient-Based Training

Chasing Your Long Tails: Differentially Private Prediction in Health Care Settings

Differentially Private Optimization with Sparse Gradients

Deep Learning with Gaussian Differential Privacy

Differentially Private Learning with Small Public Data.

Differentially private sub-Gaussian location estimators

Differentially Private Generalized Linear Models Revisited

Differentially Private Random Feature Model

Evaluating Differentially Private Machine Learning in Practice

Differentially Private Learning Beyond the Classical Dimensionality Regime

Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent

Differentially Private Algorithms for Empirical Machine Learning

Differentially Private SGD with Random Features

Differentially Private Post-Processing for Fair Regression

Differentially Private K-Means Publishing with Distributed Dimensions