Minimax rate for multivariate data under componentwise local differential privacy constraints

Chiara Amorino,Arnaud Gloter
2024-04-26
Abstract:Our research delves into the balance between maintaining privacy and preserving statistical accuracy when dealing with multivariate data that is subject to \textit{componentwise local differential privacy} (CLDP). With CLDP, each component of the private data is made public through a separate privacy channel. This allows for varying levels of privacy protection for different components or for the privatization of each component by different entities, each with their own distinct privacy policies. We develop general techniques for establishing minimax bounds that shed light on the statistical cost of privacy in this context, as a function of the privacy levels $\alpha_1, ... , \alpha_d$ of the $d$ components. We demonstrate the versatility and efficiency of these techniques by presenting various statistical applications. Specifically, we examine nonparametric density and covariance estimation under CLDP, providing upper and lower bounds that match up to constant factors, as well as an associated data-driven adaptive procedure. Furthermore, we quantify the probability of extracting sensitive information from one component by exploiting the fact that, on another component which may be correlated with the first, a smaller degree of privacy protection is guaranteed.
Statistics Theory
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to balance the relationship between maintaining privacy and retaining statistical accuracy when multivariate data is subject to Componentwise Local Differential Privacy (CLDP) constraints. Specifically, the research analyzes how to conduct effective statistical inferences while protecting privacy when each data component is made public through different privacy channels. CLDP allows different components to have different privacy protection levels or to be privatized by different entities, and each entity has its own privacy policy. In addition, it also covers scenarios in which it is not possible to jointly privatize all original data components simultaneously in practical situations. The main contribution of the research lies in developing a general technique to establish minimax bounds, which reveal how the privacy cost changes with the privacy levels \(\alpha_1, \ldots, \alpha_d\) of each component in this privacy environment. Through this technique, the authors explore multiple statistical applications such as non - parametric density estimation and covariance estimation, and provide upper and lower bounds that match in constant factors, as well as related data - driven adaptive procedures. Moreover, the authors quantify the probability of extracting sensitive information from one component, even if another component that may be associated with it is guaranteed a lower level of privacy protection. In summary, this research aims to explore how to find the optimal balance point between statistical utility and personal privacy when multivariate data is subject to Componentwise Local Differential Privacy constraints. This is not only of great significance for theoretical research, but also provides valuable guidance for data privacy protection in practical applications.