Can Language Models be Instructed to Protect Personal Information?

Yang Chen,Ethan Mendes,Sauvik Das,Wei Xu,Alan Ritter
2023-10-04
Abstract:Large multimodal language models have proven transformative in numerous applications. However, these models have been shown to memorize and leak pre-training data, raising serious user privacy and information security concerns. While data leaks should be prevented, it is also crucial to examine the trade-off between the privacy protection and model utility of proposed approaches. In this paper, we introduce PrivQA -- a multimodal benchmark to assess this privacy/utility trade-off when a model is instructed to protect specific categories of personal information in a simulated scenario. We also propose a technique to iteratively self-moderate responses, which significantly improves privacy. However, through a series of red-teaming experiments, we find that adversaries can also easily circumvent these protections with simple jailbreaking methods through textual and/or image inputs. We believe PrivQA has the potential to support the development of new models with improved privacy protections, as well as the adversarial robustness of these protections. We release the entire PrivQA dataset at <a class="link-external link-https" href="https://llm-access-control.github.io/" rel="external noopener nofollow">this https URL</a>.
Computation and Language
What problem does this paper attempt to address?
This paper attempts to address the challenges of large - scale multimodal language models in protecting personal privacy. Specifically, although these models perform well in many applications, they also have problems of memorizing and leaking pre - training data, which raise serious concerns about user privacy and information security. Therefore, the main objectives of the paper are: 1. **Evaluate the trade - off between privacy and utility**: Research how to maintain the practicality of the model while protecting certain types of personal privacy information. To this end, the authors introduce **PRIVQA** - a multimodal benchmark for evaluating the trade - off between privacy and utility when the model is instructed to protect certain types of personal information. 2. **Propose a self - regulation technique**: In order to improve the model's ability to protect privacy, the authors propose an iterative self - regulation technique, which gradually improves privacy protection by guiding the model to check and authorize its responses. 3. **Explore the vulnerability to adversarial attacks**: Through a series of red - team experiments, the authors find that even with the above - mentioned protection measures, the model is still vulnerable to simple jailbreaking methods, which can bypass the protection mechanism through text or image input. 4. **Analyze bias and robustness issues**: Research shows that although the latest API models (such as GPT - 4) are superior to open - source large - language models (such as LLaMA) in protecting personal data, there are still significant bias problems in practical applications, especially for more private or less - known individuals, and the model provides less protection instead. ### Main contributions - **Provide the first open benchmark**: Standardize the evaluation of the ability of language and vision models to follow instructions to protect personal privacy information. - **Introduce a self - regulation technique**: Improve the model's ability to follow access - control instructions and show the differences in protection effects among different groups. - **Reveal the vulnerability of adversarial techniques**: Through a series of red - team exercises, prove that the access - control instructions in the most advanced models can be easily bypassed. ### Conclusion The paper provides new tools and methods for evaluating and improving the privacy - protection ability of large - language models by introducing the PRIVQA benchmark and self - regulation technique. However, the research also points out the deficiencies of existing methods in terms of bias and robustness, emphasizing the key issues that need to be addressed in future research.