Abstract:Recent success of deep neural networks (DNNs) hinges on the availability of large-scale dataset; however, training on such dataset often poses privacy risks for sensitive training information. In this paper, we aim to explore the power of generative models and gradient sparsity, and propose a scalable privacy-preserving generative model DataLens, which is able to generate synthetic data in a differentially private (DP) way given sensitive input data. Thus, it is possible to train models for different down-stream tasks with the generated data while protecting the private information. In particular, we leverage the generative adversarial networks (GAN) and PATE framework to train multiple discriminators as "teacher" models, allowing them to vote with their gradient vectors to guarantee privacy. Comparing with the standard PATE privacy preserving framework which allows teachers to vote on one-dimensional predictions, voting on the high dimensional gradient vectors is challenging in terms of privacy preservation. As dimension reduction techniques are required, we need to navigate a delicate tradeoff space between (1) the improvement of privacy preservation and (2) the slowdown of SGD convergence. To tackle this, we propose a novel dimension compression and aggregation approach TopAgg, which combines top-k dimension compression with a corresponding noise injection mechanism. We theoretically prove that the DataLens framework guarantees differential privacy for its generated data, and provide a novel analysis on its convergence to illustrate such a tradeoff on privacy and convergence rate, which requires non-trivial analysis as it requires a joint analysis on gradient compression, coordinate-wise gradient clipping, and DP mechanism. To demonstrate the practical usage of DataLens, we conduct extensive experiments on diverse datasets including MNIST, Fashion-MNIST, and high dimensional CelebA and Place365 datasets. We show that DataLens significantly outperforms other baseline differentially private data generative models. Our code is publicly available at https://github.com/AI-secure/DataLens.

DataLens: Scalable Privacy Preserving Training via Gradient Compression and Aggregation

Privacy-Preserving Collaborative Deep Learning with Unreliable Participants.

Private Knowledge Transfer via Model Distillation with Generative Adversarial Networks

PKDGAN: Private Knowledge Distillation with Generative Adversarial Networks

G-PATE: Scalable Differentially Private Data Generator via Private Aggregation of Teacher Discriminators

Scalable Differentially Private Generative Student Model via PATE

Leveraging Programmatically Generated Synthetic Data for Differentially Private Diffusion Training

Privacy without Noisy Gradients: Slicing Mechanism for Generative Model Training

PATE-TripleGAN: Privacy-Preserving Image Synthesis with Gaussian Differential Privacy

Generating Artificial Data for Private Deep Learning

Differentially Private Generative Adversarial Network

Privacy-Preserving High-dimensional Data Collection with Federated Generative Autoencoder

Training generative models from privatized data

Differentially Private Synthetic Data Generation via Lipschitz-Regularised Variational Autoencoders

An Efficient DP-SGD Mechanism for Large Scale NLP Models

Private Dataset Generation Using Privacy Preserving Collaborative Learning

Approximate, Adapt, Anonymize (3A): a Framework for Privacy Preserving Training Data Release for Machine Learning

Federated Synthetic Data Generation with Differential Privacy

Differentially Private Latent Diffusion Models

Differentially Private Convolutional Neural Networks with Adaptive Gradient Descent.

Differentially Private Denoise Diffusion Probability Models.