Does fine-tuning GPT-3 with the OpenAI API leak personally-identifiable information?

Albert Yu Sun,Eliott Zemour,Arushi Saxena,Udith Vaidyanathan,Eric Lin,Christian Lau,Vaikkunth Mugunthan
2024-04-12
Abstract:Machine learning practitioners often fine-tune generative pre-trained models like GPT-3 to improve model performance at specific tasks. Previous works, however, suggest that fine-tuned machine learning models memorize and emit sensitive information from the original fine-tuning dataset. Companies such as OpenAI offer fine-tuning services for their models, but no prior work has conducted a memorization attack on any closed-source models. In this work, we simulate a privacy attack on GPT-3 using OpenAI's fine-tuning API. Our objective is to determine if personally identifiable information (PII) can be extracted from this model. We (1) explore the use of naive prompting methods on a GPT-3 fine-tuned classification model, and (2) we design a practical word generation task called Autocomplete to investigate the extent of PII memorization in fine-tuned GPT-3 within a real-world context. Our findings reveal that fine-tuning GPT3 for both tasks led to the model memorizing and disclosing critical personally identifiable information (PII) obtained from the underlying fine-tuning dataset. To encourage further research, we have made our codes and datasets publicly available on GitHub at:
Computation and Language
What problem does this paper attempt to address?
The problem this paper attempts to address is whether personal identifiable information (PII) can be leaked when fine-tuning GPT-3 using OpenAI's fine-tuning API. Specifically, the researchers simulated privacy attacks on GPT-3 to determine if PII from the dataset used for fine-tuning could be extracted from the fine-tuned model. To explore this issue, the research team designed two experiments: 1. **Classification Task**: A mail classifier was trained using the Enron email dataset, and a simple prompt method was used to attempt to extract PII from the model. 2. **Auto-completion Task**: Similarly, using the Enron email dataset, a text auto-completion service was trained. Users input the email subject, and the model generates the email body. This was done to investigate the storage and leakage of PII by the fine-tuned GPT-3 in real-world application scenarios. These two experiments aim to evaluate the model's memory of sensitive information and the potential risk of leakage during the fine-tuning process, particularly the privacy issues that may arise in enterprise applications. The research results indicate that even after fine-tuning, the GPT-3 model can still remember and leak critical PII from the fine-tuning dataset. This suggests that more privacy protection measures are needed when using large language models for fine-tuning to prevent the leakage of sensitive information.