Controlling the Extraction of Memorized Data from Large Language Models via Prompt-Tuning

Mustafa Safa Ozdayi,Charith Peris,Jack FitzGerald,Christophe Dupuy,Jimit Majmudar,Haidar Khan,Rahil Parikh,Rahul Gupta
2023-05-19
Abstract:Large Language Models (LLMs) are known to memorize significant portions of their training data. Parts of this memorized content have been shown to be extractable by simply querying the model, which poses a privacy risk. We present a novel approach which uses prompt-tuning to control the extraction rates of memorized content in LLMs. We present two prompt training strategies to increase and decrease extraction rates, which correspond to an attack and a defense, respectively. We demonstrate the effectiveness of our techniques by using models from the GPT-Neo family on a public benchmark. For the 1.3B parameter GPT-Neo model, our attack yields a 9.3 percentage point increase in extraction rate compared to our baseline. Our defense can be tuned to achieve different privacy-utility trade-offs by a user-specified hyperparameter. We achieve an extraction rate reduction of up to 97.7% relative to our baseline, with a perplexity increase of 16.9%.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper primarily focuses on the issue of data memorization in large language models (LLMs) and proposes a novel method to control the extraction rate of memorized data in these models. Specifically, the paper addresses the following key issues: 1. **Problem Background**: Large language models memorize a significant amount of training data during the training process, which can lead to the leakage of private information through simple queries, posing a privacy risk. 2. **Research Objective**: Develop a method to control the efficiency of extracting memorized data from large language models, aiming to both increase the extraction rate for attack testing and decrease the extraction rate as a defensive measure. 3. **Solution**: The paper proposes using prompt-tuning to achieve the aforementioned goals. This method includes two strategies: - **Attack Strategy**: Increase the extraction rate of memorized content to assess potential privacy risks. - **Defense Strategy**: Decrease the extraction rate of memorized content to protect the model from attacks. 4. **Technical Details**: The authors use continuous soft prompts and adjust these prompts to control the model's behavior. In the attack setting, prompts are optimized to increase the generation probability of specific sequences; in the defense setting, a "learning threshold" is introduced to train the prompts, thereby reducing the leakage of sensitive data. 5. **Experimental Results**: The paper demonstrates that for different sizes of GPT-Neo models, adjusting parameters such as prompt length and suffix size can effectively increase or decrease the extraction rate of memorized data. Additionally, the impact of different hyperparameters on the defense effectiveness is discussed, as well as the effectiveness of the proposed defense method compared to baseline models not trained on the same dataset. In summary, this paper provides an innovative method that can effectively control the extraction of memorized data in large language models, which is significant for assessing and mitigating potential privacy risks.