Federated Synthetic Data Generation with Differential Privacy

Bangzhou Xin,Yangyang Geng,Teng Hu,Sheng Chen,Wei Yang,Shaowei Wang,Liusheng Huang
DOI: https://doi.org/10.1016/j.neucom.2021.10.027
IF: 6
2021-01-01
Neurocomputing
Abstract:Distributed machine learning has attracted much attention in the last decade with the widespread use of the Internet of Things. As a generative model, Generative Adversarial Network (GAN) has excellent empirical performance. However, the distributed storage of data and the fact that data cannot be shared for privacy reasons in a federated learning setting bring new challenges to training GAN. To address this issue, we propose private FL-GAN, a differentially private GAN based on federated learning. By strategically combining the Lipschitz condition with differential privacy sensitivity, our model can generate high-quality synthetic data without sacrificing the training data’s privacy. When communication between clients becomes the main bottleneck for federated learning, we propose to use a serialized model-training paradigm, which significantly reduces communication costs. Considering the distributed data is often non-IID in reality, which poses challenges to modeling, we further propose universal private FL-GAN to approach this problem. We not only theoretically prove that our algorithms can provide strict privacy guarantees with differential privacy, but also experimentally demonstrate that our models can generate satisfactory data while protecting the privacy of the training data, even if the data is non-IID.
What problem does this paper attempt to address?