Abstract:Large multimodal language models have proven transformative in numerous applications. However, these models have been shown to memorize and leak pre-training data, raising serious user privacy and information security concerns. While data leaks should be prevented, it is also crucial to examine the trade-off between the privacy protection and model utility of proposed approaches. In this paper, we introduce PrivQA -- a multimodal benchmark to assess this privacy/utility trade-off when a model is instructed to protect specific categories of personal information in a simulated scenario. We also propose a technique to iteratively self-moderate responses, which significantly improves privacy. However, through a series of red-teaming experiments, we find that adversaries can also easily circumvent these protections with simple jailbreaking methods through textual and/or image inputs. We believe PrivQA has the potential to support the development of new models with improved privacy protections, as well as the adversarial robustness of these protections. We release the entire PrivQA dataset at <a class="link-external link-https" href="https://llm-access-control.github.io/" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

This paper attempts to address the challenges of large - scale multimodal language models in protecting personal privacy. Specifically, although these models perform well in many applications, they also have problems of memorizing and leaking pre - training data, which raise serious concerns about user privacy and information security. Therefore, the main objectives of the paper are: 1. **Evaluate the trade - off between privacy and utility**: Research how to maintain the practicality of the model while protecting certain types of personal privacy information. To this end, the authors introduce **PRIVQA** - a multimodal benchmark for evaluating the trade - off between privacy and utility when the model is instructed to protect certain types of personal information. 2. **Propose a self - regulation technique**: In order to improve the model's ability to protect privacy, the authors propose an iterative self - regulation technique, which gradually improves privacy protection by guiding the model to check and authorize its responses. 3. **Explore the vulnerability to adversarial attacks**: Through a series of red - team experiments, the authors find that even with the above - mentioned protection measures, the model is still vulnerable to simple jailbreaking methods, which can bypass the protection mechanism through text or image input. 4. **Analyze bias and robustness issues**: Research shows that although the latest API models (such as GPT - 4) are superior to open - source large - language models (such as LLaMA) in protecting personal data, there are still significant bias problems in practical applications, especially for more private or less - known individuals, and the model provides less protection instead. ### Main contributions - **Provide the first open benchmark**: Standardize the evaluation of the ability of language and vision models to follow instructions to protect personal privacy information. - **Introduce a self - regulation technique**: Improve the model's ability to follow access - control instructions and show the differences in protection effects among different groups. - **Reveal the vulnerability of adversarial techniques**: Through a series of red - team exercises, prove that the access - control instructions in the most advanced models can be easily bypassed. ### Conclusion The paper provides new tools and methods for evaluating and improving the privacy - protection ability of large - language models by introducing the PRIVQA benchmark and self - regulation technique. However, the research also points out the deficiencies of existing methods in terms of bias and robustness, emphasizing the key issues that need to be addressed in future research.

Can Language Models be Instructed to Protect Personal Information?

Exploring the Privacy Protection Capabilities of Chinese Large Language Models

Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory

Training Data Leakage Analysis in Language Models

Large Language Models Can Be Good Privacy Protection Learners

Are Large Pre-Trained Language Models Leaking Your Personal Information?

Identifying and Mitigating Privacy Risks Stemming from Language Models: A Survey

Beyond Memorization: Violating Privacy Via Inference with Large Language Models

You Don't Know My Favorite Color: Preventing Dialogue Representations from Revealing Speakers' Private Personas

PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners

Teach LLMs to Phish: Stealing Private Information from Language Models

Security and Privacy Challenges of Large Language Models: A Survey

Privacy Risks of General-Purpose Language Models.

On Protecting the Data Privacy of Large Language Models (LLMs): A Survey

The Phantom Menace: Unmasking Privacy Leakages in Vision-Language Models

Privacy-Aware Visual Language Models

Privacy in Large Language Models: Attacks, Defenses and Future Directions

How Privacy-Savvy Are Large Language Models? A Case Study on Compliance and Privacy Technical Review

Learning to Refuse: Towards Mitigating Privacy Risks in LLMs

Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey