Transferable Embedding Inversion Attack: Uncovering Privacy Risks in Text Embeddings without Model Queries

Yu-Hsiang Huang,Yuche Tsai,Hsiang Hsiao,Hong-Yi Lin,Shou-De Lin
2024-06-12
Abstract:This study investigates the privacy risks associated with text embeddings, focusing on the scenario where attackers cannot access the original embedding model. Contrary to previous research requiring direct model access, we explore a more realistic threat model by developing a transfer attack method. This approach uses a surrogate model to mimic the victim model's behavior, allowing the attacker to infer sensitive information from text embeddings without direct access. Our experiments across various embedding models and a clinical dataset demonstrate that our transfer attack significantly outperforms traditional methods, revealing the potential privacy vulnerabilities in embedding technologies and emphasizing the need for enhanced security measures.
Cryptography and Security,Computation and Language,Machine Learning
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to explore the privacy risks present in text embeddings, particularly in scenarios where attackers cannot directly access the original embedding model. Specifically, the focus of the research is on developing a transfer attack method, which uses a surrogate model to mimic the behavior of the target model, thereby inferring sensitive information from text embeddings without direct query access. ### Background and Motivation 1. **Widespread Application of Text Embeddings**: - Text embeddings serve as a universal representation of text data and can be used for various downstream tasks. - Large Language Models (LLMs) often combine text embeddings with vector databases to store and inject domain-specific knowledge or auxiliary data. - Retrieval-Augmented Generation (RAG) is a typical application that enhances the knowledge of LLMs by incorporating retrieved documents into the model's prompts. 2. **Introduction of Privacy Risks**: - Although many platforms claim that storing embeddings is secure, is sending text embeddings to online services truly free of privacy risks? - Embedding inversion attacks aim to reconstruct input data from embeddings, and existing research has shown that such attacks are feasible in both image and text domains. 3. **Limitations of Existing Research**: - Existing research primarily assumes that attackers can query the embedding model, which may not hold in real-world scenarios. - For instance, in data breach incidents, attackers might only obtain a small number of documents and their embeddings without the ability to query the model. ### Main Contributions of the Paper 1. **Black-Box Attack Model**: - This paper considers a black-box attack scenario where the target victim model is completely hidden from the attacker. - In this setting, standard white-box attacks or query-based black-box attacks become ineffective. 2. **Transfer Attack Method**: - A transfer attack method is proposed, utilizing a surrogate model to simulate the behavior of the victim model. - The transfer attack has two objectives: - **Objective 1 (Stealing the Text Encoder)**: Learning a surrogate model from the returned representations to make it as close as possible to the victim model. - **Objective 2 (Transferability of the Threat Model)**: Constructing a threat model by attacking the surrogate model and hoping that this threat model can also successfully deceive the victim black-box model. 3. **Experimental Validation**: - Extensive experiments were conducted on several popular embedding models, including Sentence-BERT, Sentence-T5, and OpenAI text embeddings. - Experimental results show that transfer attacks are 40%-50% more effective than standard attack methods. - A case study on the MIMIC-III clinical notes dataset demonstrated that transfer attacks could identify sensitive attributes (such as age, gender, diseases, etc.) with 80%-99% accuracy. ### Conclusion By developing a new transfer attack method, this paper reveals potential privacy vulnerabilities in text embedding technologies and emphasizes the need for enhanced security measures. Notably, even when attackers cannot directly query the embedding model, they can still effectively conduct attacks using surrogate models.