Balls-and-Bins Sampling for DP-SGD

Lynn Chua,Badih Ghazi,Charlie Harrison,Ethan Leeman,Pritish Kamath,Ravi Kumar,Pasin Manurangsi,Amer Sinha,Chiyuan Zhang
2024-12-22
Abstract:We introduce the Balls-and-Bins sampling for differentially private (DP) optimization methods such as DP-SGD. While it has been common practice to use some form of shuffling in DP-SGD implementations, privacy accounting algorithms have typically assumed that Poisson subsampling is used instead. Recent work by Chua et al. (ICML 2024) however pointed out that shuffling based DP-SGD can have a much larger privacy cost in practical regimes of parameters. We show that the Balls-and-Bins sampling achieves the "best-of-both" samplers, namely, the implementation of Balls-and-Bins sampling is similar to that of Shuffling and models trained using DP-SGD with Balls-and-Bins sampling achieve utility comparable to those trained using DP-SGD with Shuffling at the same noise multiplier, and yet, Balls-and-Bins sampling enjoys similar-or-better privacy amplification as compared to Poisson subsampling in practical regimes.
Machine Learning,Cryptography and Security,Data Structures and Algorithms
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of how to achieve better privacy protection while ensuring model utility in different differentially private (DP) optimization methods such as DP - SGD. Specifically: 1. **Limitations of existing methods**: - In the implementation of DP - SGD, some form of data shuffling is usually used. However, in privacy analysis, most algorithms assume Poisson subsampling. However, Chua et al. (2024a) pointed out that under actual parameter settings, shuffling - based DP - SGD may lead to greater privacy costs. 2. **Proposed new method**: - The paper introduced a new sampling method - Balls - and - Bins sampling. This method combines the advantages of shuffling and Poisson sampling: its implementation is similar to shuffling, but has a privacy amplification effect similar to or better than Poisson sampling. Therefore, it can achieve model utility similar to shuffling under the same noise scale while providing better privacy protection. 3. **Core problem**: - Is there a batch generator that is similar to shuffling in terms of implementation simplicity and model utility, but has advantages similar to or better than Poisson sampling in privacy analysis? ### Specific contributions - **Balls - and - Bins sampler**: The authors proposed a batch generator named Balls - and - Bins (see Algorithm 3). By randomly assigning each sample to a batch, it achieves characteristics similar to shuffling, but the marginal distribution of each batch is the same as that of Poisson sampling. - **Improvement in privacy analysis**: By identifying a tightly dominating pair, the authors were able to perform accurate privacy analysis on Balls - and - Bins sampling and prove that it is superior to shuffling and the Deterministic batch generator in all parameter ranges, and is superior to Poisson sampling when ε is large. - **Monte Carlo estimation techniques**: In order to efficiently estimate the privacy loss curve δB(ε), the authors developed new techniques such as importance sampling and order statistics sampling. These techniques not only improve the efficiency of estimation but also are applicable to large - scale data sets. - **Experimental verification**: The authors conducted experiments on multiple real - world data sets. The results show that DP - SGD using Balls - and - Bins sampling has model utility comparable to that of DP - SGD using shuffling under the same noise scale, and its privacy guarantee is better than shuffling and close to or better than Poisson sampling. ### Summary This paper solves the problem of how to achieve better privacy protection while maintaining model utility in DP - SGD by introducing the Balls - and - Bins sampling method. This method not only simplifies the implementation but also shows superior privacy performance in practical applications.