Abstract:Recently, a new paradigm of building generalpurpose language models (e.g., Google's Bert and OpenAI's GPT-2) in Natural Language Processing (NLP) for text feature extraction, a standard procedure in NLP systems that converts texts to vectors (i.e., embeddings) for downstream modeling, has arisen and starts to find its application in various downstream NLP tasks and real world systems (e.g., Google's search engine [6]). To obtain general-purpose text embeddings, these language models have highly complicated architectures with millions of learnable parameters and are usually pretrained on billions of sentences before being utilized. As is widely recognized, such a practice indeed improves the state-of-the-art performance of many downstream NLP tasks. However, the improved utility is not for free. We find the text embeddings from general-purpose language models would capture much sensitive information from the plain text. Once being accessed by the adversary, the embeddings can be reverse-engineered to disclose sensitive information of the victims for further harassment. Although such a privacy risk can impose a real threat to the future leverage of these promising NLP tools, there are neither published attacks nor systematic evaluations by far for the mainstream industry-level language models. To bridge this gap, we present the first systematic study on the privacy risks of 8 state-of-the-art language models with 4 diverse case studies. By constructing 2 novel attack classes, our study demonstrates the aforementioned privacy risks do exist and can impose practical threats to the application of general-purpose language models on sensitive data covering identity, genome, healthcare and location. For example, we show the adversary with nearly no prior knowledge can achieve about 75% accuracy when inferring the precise disease site from Bert embeddings of patients' medical descriptions. As possible countermeasures, we propose 4 different defenses (via rounding, differential privacy, adversarial training and subspace projection) to obfuscate the unprotected embeddings for mitigation purpose. With extensive evaluations, we also provide a preliminary analysis on the utilityprivacy trade-off brought by each defense, which we hope may foster future mitigation researches.

Training Data Leakage Analysis in Language Models

Teach LLMs to Phish: Stealing Private Information from Language Models

Can Language Models be Instructed to Protect Personal Information?

Inside the Black Box: Detecting Data Leakage in Pre-trained Language Encoders

Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage

Identifying and Mitigating Privacy Risks Stemming from Language Models: A Survey

Analyzing Leakage of Personally Identifiable Information in Language Models

A Little Leak Will Sink a Great Ship: Survey of Transparency for Large Language Models from Start to Finish

Extracting Training Data from Large Language Models

Analysis of Privacy Leakage in Federated Large Language Models

Survey: Leakage and Privacy at Inference Time

Data Stealing Attacks against Large Language Models via Backdooring

Are Large Pre-Trained Language Models Leaking Your Personal Information?

Information Leakage from Embedding in Large Language Models

Special Characters Attack: Toward Scalable Training Data Extraction From Large Language Models

Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey

The Phantom Menace: Unmasking Privacy Leakages in Vision-Language Models

Seeing the Forest through the Trees: Data Leakage from Partial Transformer Gradients

Privacy Risks of General-Purpose Language Models.

What can we learn from Data Leakage and Unlearning for Law?

Training Data Extraction From Pre-trained Language Models: A Survey