Preserving Privacy in GANs Against Membership Inference Attack

Mohammadhadi Shateri,Francisco Messina,Fabrice Labeau,Pablo Piantanida
2023-11-06
Abstract:Generative Adversarial Networks (GANs) have been widely used for generating synthetic data for cases where there is a limited size real-world dataset or when data holders are unwilling to share their data samples. Recent works showed that GANs, due to overfitting and memorization, might leak information regarding their training data samples. This makes GANs vulnerable to Membership Inference Attacks (MIAs). Several defense strategies have been proposed in the literature to mitigate this privacy issue. Unfortunately, defense strategies based on differential privacy are proven to reduce extensively the quality of the synthetic data points. On the other hand, more recent frameworks such as PrivGAN and PAR-GAN are not suitable for small-size training datasets. In the present work, the overfitting in GANs is studied in terms of the discriminator, and a more general measure of overfitting based on the Bhattacharyya coefficient is defined. Then, inspired by Fano's inequality, our first defense mechanism against MIAs is proposed. This framework, which requires only a simple modification in the loss function of GANs, is referred to as the maximum entropy GAN or MEGAN and significantly improves the robustness of GANs to MIAs. As a second defense strategy, a more heuristic model based on minimizing the information leaked from generated samples about the training data points is presented. This approach is referred to as mutual information minimization GAN (MIMGAN) and uses a variational representation of the mutual information to minimize the information that a synthetic sample might leak about the whole training data set. Applying the proposed frameworks to some commonly used data sets against state-of-the-art MIAs reveals that the proposed methods can reduce the accuracy of the adversaries to the level of random guessing accuracy with a small reduction in the quality of the synthetic data samples.
Machine Learning,Cryptography and Security,Signal Processing
What problem does this paper attempt to address?
The paper attempts to address the issue of protecting privacy in Generative Adversarial Networks (GANs) to prevent Membership Inference Attacks (MIAs). Specifically, the paper focuses on how to avoid leaking sensitive information from the training dataset when generating synthetic data. Due to the potential for GANs to overfit and memorize training data, they are susceptible to MIAs, where an attacker can determine whether a particular data sample was used to train the model. Some existing defense strategies, such as differential privacy-based methods, can effectively prevent MIAs but significantly reduce the quality of the generated data. Other methods, like PrivGAN and PAR-GAN, are not suitable for small-scale datasets. To address these issues, the paper proposes two new defense mechanisms: 1. **Maximum Entropy GAN (MEGAN)**: By modifying the GAN's loss function, it ensures that the model learns the distribution of the training data while reducing sensitivity to MIAs. MEGAN uses the Bhattacharyya coefficient as a measure of overfitting and maximizes this coefficient to enhance the model's robustness against MIAs. 2. **Mutual Information Minimization GAN (MIMGAN)**: By minimizing the mutual information between the generated data and the training data, it reduces the leakage of information about the entire training dataset in the generated samples. MIMGAN uses a variational representation of mutual information to achieve this goal. Experimental results on multiple commonly used datasets show that these two methods can reduce the attacker's accuracy to the level of random guessing while only causing a small impact on the quality of the generated data.