Abstract:Stochastic Gradient Descent (SGD) is a widely used tool in machine learning. In the context of Differential Privacy (DP), SGD has been well studied in the last years in which the focus is mainly on convergence rates and privacy guarantees. While in the non private case, uncertainty quantification (UQ) for SGD by bootstrap has been addressed by several authors, these procedures cannot be transferred to differential privacy due to multiple queries to the private data. In this paper, we propose a novel block bootstrap for SGD under local differential privacy that is computationally tractable and does not require an adjustment of the privacy budget. The method can be easily implemented and is applicable to a broad class of estimation problems. We prove the validity of our approach and illustrate its finite sample properties by means of a simulation study. As a by-product, the new method also provides a simple alternative numerical tool for UQ for non-private SGD.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the uncertainty quantification (UQ) in statistical inference using stochastic gradient descent (SGD) in the local differential privacy (LDP) environment. Specifically, the existing Bootstrap methods for non - private SGD cannot be directly applied to the differential privacy scenario because these methods will consume the privacy budget when querying private data multiple times, resulting in inaccurate estimation results. Therefore, this paper proposes a new block Bootstrap method. This method is not only applicable to SGD under local differential privacy, but also does not need to adjust the privacy budget, is computationally feasible, and can be widely applied to various estimation problems.
### Background and Problem Description of the Paper
1. **Differential Privacy (DP)**:
- Differential privacy is a technical framework for protecting the privacy of personal data. By introducing random noise into data or data - dependent statistics, it ensures that the change of a single data point will not significantly affect the output distribution.
- Local differential privacy (LDP) requires that privacy protection processing be carried out when data is collected, without assuming the existence of a trusted data manager.
2. **Stochastic Gradient Descent (SGD)**:
- SGD is an optimization algorithm widely used in machine learning. It minimizes the loss function by iteratively updating parameters.
- In large - scale data and complex optimization problems, SGD is widely used because of its high computational efficiency.
3. **Uncertainty Quantification (UQ)**:
- UQ refers to quantifying the uncertainty of model parameter estimates, usually by constructing confidence intervals or confidence regions.
- In non - private cases, there are multiple Bootstrap methods for UQ of SGD, but in the differential privacy environment, these methods cannot be directly applied.
### Existing Challenges
- **Consumption of Privacy Budget**: Traditional Bootstrap methods will consume a large amount of privacy budget when querying private data multiple times, resulting in inaccurate estimation results.
- **Computational Complexity**: How to efficiently perform statistical inference while maintaining privacy is a challenge.
- **Scope of Application**: Existing methods are often targeted at specific models and lack universality.
### Contributions of This Paper
- **Proposing a New Block Bootstrap Method**: In the local differential privacy environment, this method resamples the iteration results of SGD through the block Bootstrap technique, thereby achieving effective uncertainty quantification.
- **No Need to Adjust the Privacy Budget**: This method will not consume additional privacy budget during multiple resampling processes, ensuring the effectiveness of privacy protection.
- **Computationally Feasible**: This method is easy to implement and applicable to a wide range of estimation problems, including convex and non - convex optimization problems.
- **Theoretical Verification**: The author proves the consistency of this method under appropriate conditions and shows its performance in finite samples through simulation studies.
### Method Overview
1. **Principle of Block Bootstrap**:
- Divide the iteration results of SGD into several blocks, and the iteration results within each block are regarded as a whole.
- By performing weighted sampling on the results of each block, construct Bootstrap samples, and then estimate the parameter distribution.
2. **Specific Steps**:
- **Input**: Initial parameter \(\theta_0\), learning rate \(\eta_i\), privacy mechanism \(A\), and data \(X_1,\ldots,X_n\).
- **Iteration Process**: According to the LDP - SGD update rule, generate iteration results \(\theta_{\text{LDP}}^1,\ldots,\theta_{\text{LDP}}^n\).
- **Block Division**: Divide the iteration results into \(m\) blocks, and the length of each block is \(l\).
- **Resampling**: Perform weighted sampling on the results of each block to generate Bootstrap samples \(\bar{\theta}^*\).