Abstract:Differentially Private Stochastic Gradient Descent (DP-SGD) limits the amount of private information deep learning models can memorize during training. This is achieved by clipping and adding noise to the model's gradients, and thus networks with more parameters require proportionally stronger perturbation. As a result, large models have difficulties learning useful information, rendering training with DP-SGD exceedingly difficult on more challenging training tasks. Recent research has focused on combating this challenge through training adaptations such as heavy data augmentation and large batch sizes. However, these techniques further increase the computational overhead of DP-SGD and reduce its practical applicability. In this work, we propose using the principle of sparse model design to solve precisely such complex tasks with fewer parameters, higher accuracy, and in less time, thus serving as a promising direction for DP-SGD. We achieve such sparsity by design by introducing equivariant convolutional networks for model training with Differential Privacy. Using equivariant networks, we show that small and efficient architecture design can outperform current state-of-the-art models with substantially lower computational requirements. On CIFAR-10, we achieve an increase of up to $9\%$ in accuracy while reducing the computation time by more than $85\%$. Our results are a step towards efficient model architectures that make optimal use of their parameters and bridge the privacy-utility gap between private and non-private deep learning for computer vision.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to improve the practicality and efficiency of the model while ensuring privacy when using differential privacy technology (Differential Privacy, DP) for deep - learning training. Specifically, the paper focuses on the challenges encountered when training large - scale deep - learning models using Differentially Private Stochastic Gradient Descent (DP - SGD). That is, due to the clipping and noise - adding processing of gradients, it is difficult for the model to learn useful information, especially when dealing with more complex tasks, and this difficulty is more obvious. This is known as the privacy - utility trade - off. The paper proposes that by introducing sparse model design, especially using Equivariant Convolutional Networks (ECNNs), the number of model parameters can be reduced while improving the accuracy and training efficiency of the model, thus effectively solving the above - mentioned problems. ECNNs can significantly reduce the required computational resources while maintaining the performance of the model by introducing equivariance in the design, especially in cases where the data set is small or higher privacy protection is required, this method performs particularly well. The main contributions of the paper include: - Introducing the methods required for training ECNs and proposing new normalization layers that both preserve equivariance and meet the requirements of differential privacy. - Significantly improving the current state - of - the - art in differential - privacy deep - learning image benchmark tests by taking advantage of the sparsity in the design without additional data. - Providing insights into model calibration, because poor model calibration is a known weakness of DP - SGD. Research shows that the proposed equivariant architecture improves model calibration, with an average Brier score 17% lower than that of traditional networks. - Experiments show that equivariant networks are more robust to the selection of key hyper - parameters (such as augmentation in the input domain and batch size). In addition, it also analyzes how the selection of hyper - parameters specific to equivariance (such as symmetry groups) affects training under DP - SGD.

Equivariant Differentially Private Deep Learning: Why DP-SGD Needs Sparser Models

A(DP)$^2$2SGD: Asynchronous Decentralized Parallel Stochastic Gradient Descent with Differential Privacy

A(DP)$^2$SGD: Asynchronous Decentralized Parallel Stochastic Gradient Descent with Differential Privacy

Towards Efficient and Scalable Training of Differentially Private Deep Learning

Sparsity-Preserving Differentially Private Training of Large Embedding Models

An Efficient DP-SGD Mechanism for Large Scale NLP Models

Enhancing DP-SGD through Non-monotonous Adaptive Scaling Gradient Weight

DP-FP: Differentially Private Forward Propagation for Large Models

Improving Differentially Private SGD via Randomly Sparsified Gradients

DPDR: Gradient Decomposition and Reconstruction for Differentially Private Deep Learning

Dynamic Differential-Privacy Preserving SGD

Differentially Private Convolutional Neural Networks with Adaptive Gradient Descent.

Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent

DP-SGD for non-decomposable objective functions

Gradients Look Alike: Sensitivity is Often Overestimated in DP-SGD

Large Language Models Can Be Strong Differentially Private Learners

DP-LSSGD: A Stochastic Optimization Method to Lift the Utility in Privacy-Preserving ERM

Bypassing the Ambient Dimension: Private SGD with Gradient Subspace Identification

Differential Privacy Meets Neural Network Pruning