Abstract:Large Language Models (LLMs) are known to memorize significant portions of their training data. Parts of this memorized content have been shown to be extractable by simply querying the model, which poses a privacy risk. We present a novel approach which uses prompt-tuning to control the extraction rates of memorized content in LLMs. We present two prompt training strategies to increase and decrease extraction rates, which correspond to an attack and a defense, respectively. We demonstrate the effectiveness of our techniques by using models from the GPT-Neo family on a public benchmark. For the 1.3B parameter GPT-Neo model, our attack yields a 9.3 percentage point increase in extraction rate compared to our baseline. Our defense can be tuned to achieve different privacy-utility trade-offs by a user-specified hyperparameter. We achieve an extraction rate reduction of up to 97.7% relative to our baseline, with a perplexity increase of 16.9%.

What problem does this paper attempt to address?

The paper primarily focuses on the issue of data memorization in large language models (LLMs) and proposes a novel method to control the extraction rate of memorized data in these models. Specifically, the paper addresses the following key issues: 1. **Problem Background**: Large language models memorize a significant amount of training data during the training process, which can lead to the leakage of private information through simple queries, posing a privacy risk. 2. **Research Objective**: Develop a method to control the efficiency of extracting memorized data from large language models, aiming to both increase the extraction rate for attack testing and decrease the extraction rate as a defensive measure. 3. **Solution**: The paper proposes using prompt-tuning to achieve the aforementioned goals. This method includes two strategies: - **Attack Strategy**: Increase the extraction rate of memorized content to assess potential privacy risks. - **Defense Strategy**: Decrease the extraction rate of memorized content to protect the model from attacks. 4. **Technical Details**: The authors use continuous soft prompts and adjust these prompts to control the model's behavior. In the attack setting, prompts are optimized to increase the generation probability of specific sequences; in the defense setting, a "learning threshold" is introduced to train the prompts, thereby reducing the leakage of sensitive data. 5. **Experimental Results**: The paper demonstrates that for different sizes of GPT-Neo models, adjusting parameters such as prompt length and suffix size can effectively increase or decrease the extraction rate of memorized data. Additionally, the impact of different hyperparameters on the defense effectiveness is discussed, as well as the effectiveness of the proposed defense method compared to baseline models not trained on the same dataset. In summary, this paper provides an innovative method that can effectively control the extraction of memorized data in large language models, which is significant for assessing and mitigating potential privacy risks.

Controlling the Extraction of Memorized Data from Large Language Models via Prompt-Tuning

Why Are My Prompts Leaked? Unraveling Prompt Extraction Threats in Customized Large Language Models

DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer

Privacy-Preserving Prompt Tuning for Large Language Model Services

The Janus Interface: How Fine-Tuning in Large Language Models Amplifies the Privacy Risks

Unlocking Memorization in Large Language Models with Dynamic Soft Prompting

Does Prompt-Tuning Language Model Ensure Privacy?

Counterfactual Explainable Incremental Prompt Attack Analysis on Large Language Models

Prompt Stealing Attacks Against Large Language Models

Fight Back Against Jailbreaking via Prompt Adversarial Tuning

Preventing Verbatim Memorization in Language Models Gives a False Sense of Privacy

On the Privacy Risk of In-context Learning

Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs

Efficient and Private: Memorisation under differentially private parameter-efficient fine-tuning in language models

Scalable Extraction of Training Data from (Production) Language Models

AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs

Measuring memorization through probabilistic discoverable extraction

PRSA: PRompt Stealing Attacks against Large Language Models

Bag of Tricks for Training Data Extraction from Language Models

Effective Prompt Extraction from Language Models

Mitigating Memorization In Language Models