Does fine-tuning GPT-3 with the OpenAI API leak personally-identifiable information?

Albert Yu Sun,Eliott Zemour,Arushi Saxena,Udith Vaidyanathan,Eric Lin,Christian Lau,Vaikkunth Mugunthan

2024-04-12

Abstract:Machine learning practitioners often fine-tune generative pre-trained models like GPT-3 to improve model performance at specific tasks. Previous works, however, suggest that fine-tuned machine learning models memorize and emit sensitive information from the original fine-tuning dataset. Companies such as OpenAI offer fine-tuning services for their models, but no prior work has conducted a memorization attack on any closed-source models. In this work, we simulate a privacy attack on GPT-3 using OpenAI's fine-tuning API. Our objective is to determine if personally identifiable information (PII) can be extracted from this model. We (1) explore the use of naive prompting methods on a GPT-3 fine-tuned classification model, and (2) we design a practical word generation task called Autocomplete to investigate the extent of PII memorization in fine-tuned GPT-3 within a real-world context. Our findings reveal that fine-tuning GPT3 for both tasks led to the model memorizing and disclosing critical personally identifiable information (PII) obtained from the underlying fine-tuning dataset. To encourage further research, we have made our codes and datasets publicly available on GitHub at:

Computation and Language

What problem does this paper attempt to address?

The problem this paper attempts to address is whether personal identifiable information (PII) can be leaked when fine-tuning GPT-3 using OpenAI's fine-tuning API. Specifically, the researchers simulated privacy attacks on GPT-3 to determine if PII from the dataset used for fine-tuning could be extracted from the fine-tuned model. To explore this issue, the research team designed two experiments: 1. **Classification Task**: A mail classifier was trained using the Enron email dataset, and a simple prompt method was used to attempt to extract PII from the model. 2. **Auto-completion Task**: Similarly, using the Enron email dataset, a text auto-completion service was trained. Users input the email subject, and the model generates the email body. This was done to investigate the storage and leakage of PII by the fine-tuned GPT-3 in real-world application scenarios. These two experiments aim to evaluate the model's memory of sensitive information and the potential risk of leakage during the fine-tuning process, particularly the privacy issues that may arise in enterprise applications. The research results indicate that even after fine-tuning, the GPT-3 model can still remember and leak critical PII from the fine-tuning dataset. This suggests that more privacy protection measures are needed when using large language models for fine-tuning to prevent the leakage of sensitive information.

Does fine-tuning GPT-3 with the OpenAI API leak personally-identifiable information?

Generated Data with Fake Privacy: Hidden Dangers of Fine-tuning Large Language Models on Generated Data

The Janus Interface: How Fine-Tuning in Large Language Models Amplifies the Privacy Risks

PreCurious: How Innocent Pre-Trained Language Models Turn into Privacy Traps

TMI! Finetuned Models Leak Private Information from their Pretraining Data

Analyzing Leakage of Personally Identifiable Information in Language Models

Controlling the Extraction of Memorized Data from Large Language Models via Prompt-Tuning

Exploiting Novel GPT-4 APIs

Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models

Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMs

Privacy Re-identification Attacks on Tabular GANs

Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage

Tunable Privacy Risk Evaluation of Generative Adversarial Networks

Reconstruct Your Previous Conversations! Comprehensively Investigating Privacy Leakage Risks in Conversations with GPT Models

Memorization of Named Entities in Fine-tuned BERT Models

Are Large Pre-Trained Language Models Leaking Your Personal Information?

Combing for Credentials: Active Pattern Extraction from Smart Reply

Removing RLHF Protections in GPT-4 via Fine-Tuning

GPT in Sheep's Clothing: The Risk of Customized GPTs

Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation

User Inference Attacks on Large Language Models