Black-Box Training Data Identification in GANs via Detector Networks

Lukman Olagoke,Salil Vadhan,Seth Neel
2023-10-18
Abstract:Since their inception Generative Adversarial Networks (GANs) have been popular generative models across images, audio, video, and tabular data. In this paper we study whether given access to a trained GAN, as well as fresh samples from the underlying distribution, if it is possible for an attacker to efficiently identify if a given point is a member of the GAN's training data. This is of interest for both reasons related to copyright, where a user may want to determine if their copyrighted data has been used to train a GAN, and in the study of data privacy, where the ability to detect training set membership is known as a membership inference attack. Unlike the majority of prior work this paper investigates the privacy implications of using GANs in black-box settings, where the attack only has access to samples from the generator, rather than access to the discriminator as well. We introduce a suite of membership inference attacks against GANs in the black-box setting and evaluate our attacks on image GANs trained on the CIFAR10 dataset and tabular GANs trained on genomic data. Our most successful attack, called The Detector, involve training a second network to score samples based on their likelihood of being generated by the GAN, as opposed to a fresh sample from the distribution. We prove under a simple model of the generator that the detector is an approximately optimal membership inference attack. Across a wide range of tabular and image datasets, attacks, and GAN architectures, we find that adversaries can orchestrate non-trivial privacy attacks when provided with access to samples from the generator. At the same time, the attack success achievable against GANs still appears to be lower compared to other generative and discriminative models; this leaves the intriguing open question of whether GANs are in fact more private, or if it is a matter of developing stronger attacks.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The paper primarily explores the issue of training data recognition in Generative Adversarial Networks (GANs) under a black-box setting, specifically whether an attacker can effectively determine if a given data point belongs to the GAN's training set when only having access to generator samples. This issue is of significant importance for copyright protection and data privacy. ### Research Background and Motivation - **Research Background**: Since GANs were proposed in 2014, they have achieved remarkable success in generating images, audio, video, and tabular data. However, when training data is sensitive, directly sharing this data may leak private information. Therefore, researchers tend to share trained generative models or new synthetic datasets generated by these models. - **Motivation**: Although this approach appears to protect privacy on the surface, previous studies have shown that even sharing the generative model alone can potentially leak private information about the training data. The paper specifically points out that compared to diffusion models, GANs perform better in terms of privacy leakage, making GANs a better choice for handling highly sensitive data. However, most existing studies focus on the "white-box" setting, where the attacker has access to the GAN's discriminator. In practice, usually only the generator is shared. ### Research Objectives The paper aims to address the following questions: 1. **Privacy Attacks in Black-Box Settings**: Can effective privacy attacks be achieved when only generator samples are accessible? 2. **Copyright Issues**: If an image creator suspects that their work has been used to train a generative model without permission, how can they prove that their image might be included in the training set? ### Main Contributions - **Novel Attack Methods**: The paper develops a series of membership inference attack methods against GANs, particularly proposing a new method called "Detector" and its extended version (Augmented Detector or ADIS), which trains a second network to distinguish between GAN-generated samples and real samples. - **Theoretical Analysis**: The paper also provides some theoretical results, revealing the characteristics of the detector attack's performance and proving that under simplified models, this detector attack is approximately the optimal membership inference attack. - **Empirical Studies**: The authors conduct extensive experimental evaluations on different types of GAN architectures, including GANs trained on the CIFAR10 image dataset and genomic datasets. The results show that even under black-box settings, attackers can achieve non-trivial privacy attacks, although GANs seem to leak less private information compared to other types of generative models. In summary, through a combination of theoretical analysis and empirical studies, the paper delves into the privacy protection capabilities of GANs under black-box settings and proposes a novel and efficient attack method.