Abstract:Recently, a new paradigm of building generalpurpose language models (e.g., Google's Bert and OpenAI's GPT-2) in Natural Language Processing (NLP) for text feature extraction, a standard procedure in NLP systems that converts texts to vectors (i.e., embeddings) for downstream modeling, has arisen and starts to find its application in various downstream NLP tasks and real world systems (e.g., Google's search engine [6]). To obtain general-purpose text embeddings, these language models have highly complicated architectures with millions of learnable parameters and are usually pretrained on billions of sentences before being utilized. As is widely recognized, such a practice indeed improves the state-of-the-art performance of many downstream NLP tasks. However, the improved utility is not for free. We find the text embeddings from general-purpose language models would capture much sensitive information from the plain text. Once being accessed by the adversary, the embeddings can be reverse-engineered to disclose sensitive information of the victims for further harassment. Although such a privacy risk can impose a real threat to the future leverage of these promising NLP tools, there are neither published attacks nor systematic evaluations by far for the mainstream industry-level language models. To bridge this gap, we present the first systematic study on the privacy risks of 8 state-of-the-art language models with 4 diverse case studies. By constructing 2 novel attack classes, our study demonstrates the aforementioned privacy risks do exist and can impose practical threats to the application of general-purpose language models on sensitive data covering identity, genome, healthcare and location. For example, we show the adversary with nearly no prior knowledge can achieve about 75% accuracy when inferring the precise disease site from Bert embeddings of patients' medical descriptions. As possible countermeasures, we propose 4 different defenses (via rounding, differential privacy, adversarial training and subspace projection) to obfuscate the unprotected embeddings for mitigation purpose. With extensive evaluations, we also provide a preliminary analysis on the utilityprivacy trade-off brought by each defense, which we hope may foster future mitigation researches.

Understanding Privacy Risks of Embeddings Induced by Large Language Models

Information Leakage from Embedding in Large Language Models

Privacy in Large Language Models: Attacks, Defenses and Future Directions

Exploring the Privacy Protection Capabilities of Chinese Large Language Models

On Protecting the Data Privacy of Large Language Models (LLMs): A Survey

Mitigating Privacy Risks in LLM Embeddings from Embedding Inversion

Privacy Risks of General-Purpose Language Models.

A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly

Privacy Preservation of Large Language Models in the Metaverse Era: Research Frontiers, Categorical Comparisons, and Future Directions

Privacy-Preserving Large Language Models: Mechanisms, Applications, and Future Directions

Privacy-Preserving Large Language Models (PPLLMs)

Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions

"I Always Felt that Something Was Wrong.": Understanding Compliance Risks and Mitigation Strategies when Professionals Use Large Language Models

Embedding Large Language Models into Extended Reality: Opportunities and Challenges for Inclusion, Engagement, and Privacy

Beyond Memorization: Violating Privacy Via Inference with Large Language Models

Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey

Large Language Models Can Be Good Privacy Protection Learners

Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities

Misinforming LLMs: vulnerabilities, challenges and opportunities

Identifying and Mitigating Privacy Risks Stemming from Language Models: A Survey