Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems

Zhenting Qi,Hanlin Zhang,Eric Xing,Sham Kakade,Himabindu Lakkaraju

2024-10-07

Abstract:Retrieval-Augmented Generation (RAG) improves pre-trained models by incorporating external knowledge at test time to enable customized adaptation. We study the risk of datastore leakage in Retrieval-In-Context RAG Language Models (LMs). We show that an adversary can exploit LMs' instruction-following capabilities to easily extract text data verbatim from the datastore of RAG systems built with instruction-tuned LMs via prompt injection. The vulnerability exists for a wide range of modern LMs that span Llama2, Mistral/Mixtral, Vicuna, SOLAR, WizardLM, Qwen1.5, and Platypus2, and the exploitability exacerbates as the model size scales up. We also study multiple effects of RAG setup on the extractability of data, indicating that following unexpected instructions to regurgitate data can be an outcome of failure in effectively utilizing contexts for modern LMs, and further show that such vulnerability can be greatly mitigated by position bias elimination strategies. Extending our study to production RAG models GPTs, we design an attack that can cause datastore leakage with a 100% success rate on 25 randomly selected customized GPTs with at most 2 queries, and we extract text data verbatim at a rate of 41% from a book of 77,000 words and 3% from a corpus of 1,569,000 words by prompting the GPTs with only 100 queries generated by themselves.

Computation and Language,Artificial Intelligence,Cryptography and Security,Machine Learning

What problem does this paper attempt to address?

The paper attempts to address the issue in Retrieval-Augmented Generation (RAG) systems where Language Models (LMs) may leak information from data storage by following malicious instructions through the Retrieval-In-Context (RIC) method. Specifically, the paper explores how to exploit the instruction-following capability of LMs by injecting malicious prompts to extract text data from the non-parametric data storage of RAG systems. The study finds that this vulnerability is widespread in various modern LMs and that the risk of this vulnerability increases with the scale of the model. The paper validates the existence of this issue through experiments and analyzes the impact of different factors on the risk of data leakage, including model size, instruction tuning, and whether the content of the data storage was seen during the pre-training phase. Additionally, the paper proposes several mitigation strategies to reduce the risk of data leakage, including Safety-Aware Prompts and Position Bias Elimination. These strategies aim to enhance the model's ability to distinguish between legitimate and malicious prompts, thereby reducing the risk of malicious exploitation.

Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems

RAG-Thief: Scalable Extraction of Private Data from Retrieval-Augmented Generation Applications with Agent-based Attacks

BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models

"Glue pizza and eat rocks" -- Exploiting Vulnerabilities in Retrieval-Augmented Generative Models

HijackRAG: Hijacking Attacks against Retrieval-Augmented Large Language Models

On the Vulnerability of Applying Retrieval-Augmented Generation within Knowledge-Intensive Application Domains

Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models

The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation (RAG)

Towards More Robust Retrieval-Augmented Generation: Evaluating RAG Under Adversarial Poisoning Attacks

RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards

Phantom: General Trigger Attacks on Retrieval Augmented Language Generation

When Machine Unlearning Meets Retrieval-Augmented Generation (RAG): Keep Secret or Forget Knowledge?

Is My Data in Your Retrieval Database? Membership Inference Attacks Against Retrieval Augmented Generation

The Chronicles of RAG: The Retriever, the Chunk and the Generator

Scalable Extraction of Training Data from (Production) Language Models

TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in Large Language Models

Certifiably Robust RAG against Retrieval Corruption

Retrieval-Augmented Generation for Large Language Models: A Survey

Unleashing Worms and Extracting Data: Escalating the Outcome of Attacks against RAG-based Inference in Scale and Severity Using Jailbreaking

Deploying Large Language Models With Retrieval Augmented Generation

Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data