Abstract:This study investigates the privacy risks associated with text embeddings, focusing on the scenario where attackers cannot access the original embedding model. Contrary to previous research requiring direct model access, we explore a more realistic threat model by developing a transfer attack method. This approach uses a surrogate model to mimic the victim model's behavior, allowing the attacker to infer sensitive information from text embeddings without direct access. Our experiments across various embedding models and a clinical dataset demonstrate that our transfer attack significantly outperforms traditional methods, revealing the potential privacy vulnerabilities in embedding technologies and emphasizing the need for enhanced security measures.

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to explore the privacy risks present in text embeddings, particularly in scenarios where attackers cannot directly access the original embedding model. Specifically, the focus of the research is on developing a transfer attack method, which uses a surrogate model to mimic the behavior of the target model, thereby inferring sensitive information from text embeddings without direct query access. ### Background and Motivation 1. **Widespread Application of Text Embeddings**: - Text embeddings serve as a universal representation of text data and can be used for various downstream tasks. - Large Language Models (LLMs) often combine text embeddings with vector databases to store and inject domain-specific knowledge or auxiliary data. - Retrieval-Augmented Generation (RAG) is a typical application that enhances the knowledge of LLMs by incorporating retrieved documents into the model's prompts. 2. **Introduction of Privacy Risks**: - Although many platforms claim that storing embeddings is secure, is sending text embeddings to online services truly free of privacy risks? - Embedding inversion attacks aim to reconstruct input data from embeddings, and existing research has shown that such attacks are feasible in both image and text domains. 3. **Limitations of Existing Research**: - Existing research primarily assumes that attackers can query the embedding model, which may not hold in real-world scenarios. - For instance, in data breach incidents, attackers might only obtain a small number of documents and their embeddings without the ability to query the model. ### Main Contributions of the Paper 1. **Black-Box Attack Model**: - This paper considers a black-box attack scenario where the target victim model is completely hidden from the attacker. - In this setting, standard white-box attacks or query-based black-box attacks become ineffective. 2. **Transfer Attack Method**: - A transfer attack method is proposed, utilizing a surrogate model to simulate the behavior of the victim model. - The transfer attack has two objectives: - **Objective 1 (Stealing the Text Encoder)**: Learning a surrogate model from the returned representations to make it as close as possible to the victim model. - **Objective 2 (Transferability of the Threat Model)**: Constructing a threat model by attacking the surrogate model and hoping that this threat model can also successfully deceive the victim black-box model. 3. **Experimental Validation**: - Extensive experiments were conducted on several popular embedding models, including Sentence-BERT, Sentence-T5, and OpenAI text embeddings. - Experimental results show that transfer attacks are 40%-50% more effective than standard attack methods. - A case study on the MIMIC-III clinical notes dataset demonstrated that transfer attacks could identify sensitive attributes (such as age, gender, diseases, etc.) with 80%-99% accuracy. ### Conclusion By developing a new transfer attack method, this paper reveals potential privacy vulnerabilities in text embedding technologies and emphasizes the need for enhanced security measures. Notably, even when attackers cannot directly query the embedding model, they can still effectively conduct attacks using surrogate models.

Transferable Embedding Inversion Attack: Uncovering Privacy Risks in Text Embeddings without Model Queries

The Secret Revealer: Generative Model-Inversion Attacks Against Deep Neural Networks

Private Data Inference Attacks against Cloud: Model, Technologies, and Research Directions

You See What I Want You to See: Exploring Targeted Black-Box Transferability Attack for Hash-based Image Retrieval Systems

Understanding and Mitigating the Threat of Vec2Text to Dense Retrieval Systems

Devil in Disguise: Breaching Graph Neural Networks Privacy through Infiltration

Text Embedding Inversion Security for Multilingual Language Models

An Inversion Attack Against Obfuscated Embedding Matrix in Language Model Inference

Mitigating Privacy Risks in LLM Embeddings from Embedding Inversion

Text Revealer: Private Text Reconstruction via Model Inversion Attacks against Transformers

Information Leakage from Embedding in Large Language Models

On the Transferability of Adversarial Attacksagainst Neural Text Classifier

Privacy Analysis of Deep Learning in the Wild: Membership Inference Attacks against Transfer Learning

Privacy Leakage on DNNs: A Survey of Model Inversion Attacks and Defenses

Model Inversion Attack against Transfer Learning: Inverting a Model without Accessing It

Personalization as a Shortcut for Few-Shot Backdoor Attack against Text-to-Image Diffusion Models

Transfer Attacks Revisited: A Large-Scale Empirical Study in Real Computer Vision Settings

Privacy Vulnerability of Split Computing to Data-Free Model Inversion Attacks

Mind Your Questions! Towards Backdoor Attacks on Text-to-Visualization Models

Adversarial Transfer Attacks With Unknown Data and Class Overlap

Text Embeddings Reveal (Almost) As Much As Text