Closed-Form Bounds for DP-SGD against Record-level Inference

Giovanni Cherubin,Boris Köpf,Andrew Paverd,Shruti Tople,Lukas Wutschitz,Santiago Zanella-Béguelin
2024-02-22
Abstract:Machine learning models trained with differentially-private (DP) algorithms such as DP-SGD enjoy resilience against a wide range of privacy attacks. Although it is possible to derive bounds for some attacks based solely on an $(\varepsilon,\delta)$-DP guarantee, meaningful bounds require a small enough privacy budget (i.e., injecting a large amount of noise), which results in a large loss in utility. This paper presents a new approach to evaluate the privacy of machine learning models against specific record-level threats, such as membership and attribute inference, without the indirection through DP. We focus on the popular DP-SGD algorithm, and derive simple closed-form bounds. Our proofs model DP-SGD as an information theoretic channel whose inputs are the secrets that an attacker wants to infer (e.g., membership of a data record) and whose outputs are the intermediate model parameters produced by iterative optimization. We obtain bounds for membership inference that match state-of-the-art techniques, whilst being orders of magnitude faster to compute. Additionally, we present a novel data-dependent bound against attribute inference. Our results provide a direct, interpretable, and practical way to evaluate the privacy of trained models against specific inference threats without sacrificing utility.
Cryptography and Security,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to directly evaluate the privacy - protecting ability of a machine - learning model against specific record - level threats (such as membership inference and attribute inference) when trained with differential privacy (DP) algorithms like DP - SGD, without the need to evaluate through indirect DP parameters (\( \varepsilon, \delta \)). Specifically, the authors propose a new method that can directly calculate the security bounds of DP - SGD in the face of specific record - level threats. These bounds are given in closed form and are much faster to calculate than the existing state - of - the - art techniques. ### Main contributions of the paper: 1. **Direct privacy measurement**: A new method is proposed that can directly measure the privacy - protecting ability of a machine - learning model trained with DP - SGD against specific privacy threats (such as membership inference and attribute inference), without the need to evaluate through indirect DP parameters (\( \varepsilon, \delta \)). 2. **High computational efficiency**: Compared with existing techniques, the new method is several orders of magnitude faster in terms of calculation. 3. **Theoretical analysis**: By modeling DP - SGD as an information - theoretic channel, simple closed - form bounds are derived and the correctness and effectiveness of these bounds are proven. 4. **Wide application**: The new method is applicable not only to membership inference (MIA) but also to attribute inference (AI), and performs better in AI, which provides a better utility - privacy trade - off for practical applications. ### Solutions to specific problems: - **Membership Inference (MIA)**: The authors show the security of DP - SGD in the face of membership inference attacks by deriving a closed - form bound. This bound matches the existing DP - based methods but is faster to calculate. - **Attribute Inference (AI)**: The authors also derive a new bound for attribute inference and find that DP - SGD is more secure in the face of attribute inference than in the face of membership inference. This finding is particularly important for applications that need to protect sensitive attributes. ### Theoretical basis: - **Bayes Security Measure**: The authors use the Bayes security measure to quantify the security of the model. This measure has good interpretability and can be directly related to the success probability of the attacker. - **Total Variation Distance**: By calculating the total variation distance between two Gaussian distributions, the authors derive the security bounds of DP - SGD. ### Experimental results: - **Comparison with PLD accounting method**: The experimental results show that when the noise parameter \( \sigma \) is large, the bounds of the new method are very close to the estimates of the PLD accounting method with a very small error. - **Computational efficiency**: The new method is much faster than the existing state - of - the - art techniques, especially when dealing with large - scale data sets. In conclusion, this paper solves the problem of evaluating the privacy - protecting ability of machine - learning models against specific record - level threats when trained with DP - SGD by proposing a new, efficient direct evaluation method, providing important theoretical support and practical guidance for practical applications.