Inference and Interference: The Role of Clipping, Pruning and Loss Landscapes in Differentially Private Stochastic Gradient Descent

Lauren Watson,Eric Gan,Mohan Dantam,Baharan Mirzasoleiman,Rik Sarkar
DOI: https://doi.org/10.48550/arXiv.2311.06839
2023-11-12
Abstract:Differentially private stochastic gradient descent (DP-SGD) is known to have poorer training and test performance on large neural networks, compared to ordinary stochastic gradient descent (SGD). In this paper, we perform a detailed study and comparison of the two processes and unveil several new insights. By comparing the behavior of the two processes separately in early and late epochs, we find that while DP-SGD makes slower progress in early stages, it is the behavior in the later stages that determines the end result. This separate analysis of the clipping and noise addition steps of DP-SGD shows that while noise introduces errors to the process, gradient descent can recover from these errors when it is not clipped, and clipping appears to have a larger impact than noise. These effects are amplified in higher dimensions (large neural networks), where the loss basin occupies a lower dimensional space. We argue theoretically and using extensive experiments that magnitude pruning can be a suitable dimension reduction technique in this regard, and find that heavy pruning can improve the test accuracy of DPSGD.
Machine Learning,Cryptography and Security
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the problem of **poor performance of Differential Privacy Stochastic Gradient Descent (DP - SGD) when training large - scale neural networks**. Specifically, compared with ordinary Stochastic Gradient Descent (SGD), DP - SGD performs poorly in training and testing performance. By studying and comparing the behaviors of these two processes in detail, the author reveals several new insights into the performance gap. #### Main problems and challenges 1. **Impact of the early stage vs the late stage**: - Previous research assumed that DP - SGD performs poorly in the early stage of optimization, resulting in its inability to find a good loss basin. However, through experiments, this paper finds that in fact, the performance of DP - SGD in the later stage is more critical. 2. **Impact of Clipping and Noise Addition**: - The noise and clipping operations introduced by DP - SGD will introduce errors, especially in high - dimensional spaces, and these errors are difficult to recover. The author finds that clipping has a greater impact on model performance than noise. 3. **Characteristics of the loss basin in high - dimensional space**: - In high - dimensional space, the loss basin occupies a lower - dimensional space, which makes it more difficult for DP - SGD to find and stay at the bottom of the basin. 4. **Impact of the number of model parameters**: - As the number of model parameters increases, the performance of DP - SGD decreases significantly because more noise needs to be added to ensure privacy. #### Solutions To solve the above problems, the author proposes and verifies several methods: - **Magnitude Pruning**: By reducing the number of model parameters, the impact of noise can be reduced, thereby improving the performance of DP - SGD. Experiments show that heavy pruning can significantly improve the test accuracy of DP - SGD. - **Phased training strategy**: By dividing the training process into two phases (Phase 1 and Phase 2) and using SGD and DP - SGD for training respectively, the author finds that the training method in the later stage has a greater impact on the final performance. - **Theoretical analysis**: By defining a term \( R \) based on the variance of each dimension and the norm of the true gradient, the author quantitatively analyzes the impact of pruning on gradient descent and proves that pruning can reduce the harmful effects of pruning operations. ### Summary This paper deeply analyzes the performance problems of DP - SGD when training large - scale neural networks and proposes effective methods such as pruning to alleviate these problems, thereby improving the practicality and performance of DP - SGD.