Abstract:We study the problem of computing the privacy parameters for DP machine learning when using privacy amplification via random batching and noise correlated across rounds via a correlation matrix $\textbf{C}$ (i.e., the matrix mechanism). Past work on this problem either only applied to banded $\textbf{C}$, or gave loose privacy parameters. In this work, we give a framework for computing near-exact privacy parameters for any lower-triangular, non-negative $\textbf{C}$. Our framework allows us to optimize the correlation matrix $\textbf{C}$ while accounting for amplification, whereas past work could not. Empirically, we show this lets us achieve smaller RMSE on prefix sums than the previous state-of-the-art (SOTA). We also show that we can improve on the SOTA performance on deep learning tasks. Our two main technical tools are (i) using Monte Carlo accounting to bypass composition, which was the main technical challenge for past work, and (ii) a "balls-in-bins" batching scheme that enables easy privacy analysis and is closer to practical random batching than Poisson sampling.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to calculate the privacy parameters in differentially private (DP) machine learning, especially the privacy amplification problem when using random batches and adding noise across rounds via the correlation matrix $C$. Previous work was either only applicable to banded matrices $C$ or gave relatively loose privacy parameters. This paper proposes a framework for calculating approximately exact privacy parameters for any lower - triangular non - negative matrix $C$. This framework allows for the optimization of the correlation matrix $C$ while considering the privacy amplification effect, which previous work was unable to do. ### Main contributions of the paper 1. **Algorithmic contributions**: - **Balls - in - bins mini - batch scheme**: A more practical sampling scheme is proposed, which does not require random access to the entire data set, achieving better privacy amplification effects than sub - optimal alternatives (such as shuffling), and can be efficiently analyzed for amplification via Monte Carlo sampling. - **Approximately exact analysis**: The Monte Carlo accounting method is used to bypass the combinatorial theorem, thus avoiding the relaxation problems in previous work, and enabling almost exact privacy analysis and optimization without constraining the structure of the correlation matrix. 2. **Optimizing the correlation matrix $C$**: - For the first time, it is shown how to optimize the correlation matrix $C$ to minimize the root - mean - square error (RMSE) in the case of privacy amplification. This solves the problem in previous work that required expensive grid searches to check all possible configurations. 3. **Empirical evaluation**: - **RMSE analysis**: The RMSE of the correlation matrices using different amplification schemes on the prefix sum is compared, and the results show that the new method can significantly reduce the RMSE. - **CIFAR - 10 experiment**: The VGG model is trained using the new privacy analysis method, and the results show that the same accuracy as the previous optimal method is obtained at a smaller $\epsilon$, and the absolute accuracy is improved by up to 1% at a larger $\epsilon$. ### Formula presentation The formulas involved in the paper are as follows: - **Definition of RMSE**: \[ \text{RMSE}=\sigma\cdot\|A^{-1}C\|_F \] where $A$ is a lower - triangular matrix of all ones, and $\|\cdot\|_F$ represents the Frobenius norm. - **Dominant distribution**: \[ P_C,\sigma=\frac{1}{b}\sum_{i = 1}^bN(m_i,\sigma^2I),\quad Q_\sigma=N(0,\sigma^2I) \] where $m_i=\sum_{j = 0}^{E - 1}|C|_{1:n,b\cdot j + i}$. - **Monte Carlo accounting method**: \[ H_\alpha(P,Q)=E_Y[\max\{1-\alpha e^{-Y},0\}] \] where $Y = \log\left(\frac{P(X)}{Q(X)}\right)$. ### Summary In this paper, by introducing the Monte Carlo accounting method and the balls - in - bins mini - batch scheme, the challenges in calculating privacy parameters in differentially private machine learning are solved, especially for the case of non - banded correlation matrices $C$. These improvements enable more accurate privacy parameters to be obtained in multiple settings and improve the model performance in practical applications.

Near Exact Privacy Amplification for Matrix Mechanisms

Privacy Amplification for Matrix Mechanisms

Multi-Epoch Matrix Factorization Mechanisms for Private Machine Learning

Scaling up the Banded Matrix Factorization Mechanism for Differentially Private ML

Privacy Amplification for the Gaussian Mechanism via Bounded Support

Unified Mechanism-Specific Amplification by Subsampling and Group Privacy Amplification

Almost Tight Error Bounds on Differentially Private Continual Counting

Optimal error of query sets under the differentially-private matrix mechanism

Improved Matrix Gaussian Mechanism for Differential Privacy

Correlated Noise Provably Beats Independent Noise for Differentially Private Learning

Correlated Privacy Mechanisms for Differentially Private Distributed Mean Estimation

Reducing Noise Level in Differential Privacy through Matrix Masking

Privacy Profiles for Private Selection

Differential Privacy with Higher Utility by Exploiting Coordinate-wise Disparity: Laplace Mechanism Can Beat Gaussian in High Dimensions

Privacy accounting $\varepsilon$conomics: Improving differential privacy composition via a posteriori bounds

Less is More: Revisiting the Gaussian Mechanism for Differential Privacy

Improved Differential Privacy for SGD via Optimal Private Linear Operators on Adaptive Streams

Deciphering the Interplay between Local Differential Privacy, Average Bayesian Privacy, and Maximum Bayesian Privacy

Avoiding Pitfalls for Privacy Accounting of Subsampled Mechanisms under Composition

A Differential Privacy Mechanism Design Under Matrix-Valued Query

Distributed Differential Privacy in Multi-Armed Bandits