Abstract:Federated learning aims to protect data privacy by collaboratively learning a model without sharing private data among users. However, an adversary may still be able to infer the private training data by attacking the released model. Differential privacy provides a statistical protection against such attacks at the price of significantly degrading the accuracy or utility of the trained models. In this paper, we investigate a utility enhancement scheme based on Laplacian smoothing for differentially private federated learning (DP-Fed-LS), where the parameter aggregation with injected Gaussian noise is improved in statistical precision without losing privacy budget. Our key observation is that the aggregated gradients in federated learning often enjoy a type of smoothness, i.e. sparsity in the graph Fourier basis with polynomial decays of Fourier coefficients as frequency grows, which can be exploited by the Laplacian smoothing efficiently. Under a prescribed differential privacy budget, convergence error bounds with tight rates are provided for DP-Fed-LS with uniform subsampling of heterogeneous Non-IID data, revealing possible utility improvement of Laplacian smoothing in effective dimensionality and variance reduction, among others. Experiments over MNIST, SVHN, and Shakespeare datasets show that the proposed method can improve model accuracy with DP-guarantee and membership privacy under both uniform and Poisson subsampling mechanisms.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to improve the utility of the model while protecting data privacy in federated learning. Specifically, although federated learning protects data privacy by collaborating to train models without directly sharing the original data, attackers may still infer private training data by analyzing the published models. Differential Privacy (DP) provides a statistical protection method to prevent such attacks, but this usually leads to a significant decline in the accuracy and practicality of the trained model. Therefore, this paper studies a utility - enhancing scheme based on Laplacian Smoothing (DP - Fed - LS) for differentially private federated learning, aiming to improve the statistical accuracy in parameter aggregation without losing the privacy budget.
### Main contributions:
1. **Introducing Laplacian Smoothing**: Introduce Laplacian Smoothing in differentially private federated learning. Utilize the observed fact that gradient aggregation in federated learning usually has a certain smoothness (i.e., sparsity in the graph Fourier basis, and the Fourier coefficients decay polynomially as the frequency increases), thereby reducing the variance and improving the accuracy of gradient estimation.
2. **Strict privacy budget guarantee**: Establish a strict upper bound for the differential privacy budget. Based on the new closed - form privacy bounds, these bounds are applicable to the uniform subsampling and Poisson subsampling mechanisms and are tighter than the existing results.
3. **Convergence analysis**: Develop the convergence bounds of DP - Fed - LS in the strongly convex, generally convex, and non - convex settings. These bounds match the convergence rate and communication complexity of federated learning without differential privacy and are extended to include the effects of differential privacy and Laplacian Smoothing.
4. **Experimental verification**: By training logistic regression models, convolutional neural networks (CNN), and long - short - term memory (LSTM) models on the MNIST, SVHN, and Shakespeare datasets, it is shown that DP - Fed - LS improves the accuracy of the model while maintaining the same differential privacy guarantee and member privacy.
### Background and related work:
- **Risks in federated learning**: Although federated learning trains models without directly accessing the original data, there are still risks of privacy leakage, such as model poisoning attacks and model inversion attacks.
- **Differential privacy**: Differential privacy protects privacy by adding noise to the output or update, ensuring that the model is insensitive to changes in individual records.
- **Differential privacy in distributed settings**: Differential privacy has been applied in various distributed learning scenarios, but existing methods will lead to a large noise level when the number of small clients is small, which significantly reduces the utility of the model.
### Method:
- **Algorithm description**: In each communication round of the DP - Fed - LS algorithm, the server distributes the global model to a randomly selected group of clients. The clients use local data to update the model and send the updated model back to the server. The server aggregates these local models and applies Laplacian Smoothing to generate a new global model.
- **Laplacian Smoothing**: Implement Laplacian Smoothing through the fast Fourier transform, using the smoothness of the gradient in the Fourier basis to reduce noise.
- **Convergence and privacy guarantee**: Through strict mathematical derivations, provide the convergence and privacy guarantees of DP - Fed - LS in different settings.
### Experimental results:
- **Model performance**: Experiments on the MNIST, SVHN, and Shakespeare datasets show that DP - Fed - LS improves the accuracy of the model while maintaining the same differential privacy guarantee.
- **Communication complexity**: In achieving the same optimization error, the communication complexity of DP - Fed - LS is comparable to existing methods, but in some cases is better.
In conclusion, this paper effectively solves the problem of utility decline in differentially private federated learning by introducing the Laplacian Smoothing technique, providing a new solution for improving model performance while protecting data privacy.