Abstract:We are currently witnessing dramatic advances in the capabilities of Large Language Models (LLMs). They are already being adopted in practice and integrated into many systems, including integrated development environments (IDEs) and search engines. The functionalities of current LLMs can be modulated via natural language prompts, while their exact internal functionality remains implicit and unassessable. This property, which makes them adaptable to even unseen tasks, might also make them susceptible to targeted adversarial prompting. Recently, several ways to misalign LLMs using Prompt Injection (PI) attacks have been introduced. In such attacks, an adversary can prompt the LLM to produce malicious content or override the original instructions and the employed filtering schemes. Recent work showed that these attacks are hard to mitigate, as state-of-the-art LLMs are instruction-following. So far, these attacks assumed that the adversary is directly prompting the LLM. In this work, we show that augmenting LLMs with retrieval and API calling capabilities (so-called Application-Integrated LLMs) induces a whole new set of attack vectors. These LLMs might process poisoned content retrieved from the Web that contains malicious prompts pre-injected and selected by adversaries. We demonstrate that an attacker can indirectly perform such PI attacks. Based on this key insight, we systematically analyze the resulting threat landscape of Application-Integrated LLMs and discuss a variety of new attack vectors. To demonstrate the practical viability of our attacks, we implemented specific demonstrations of the proposed attacks within synthetic applications. In summary, our work calls for an urgent evaluation of current mitigation techniques and an investigation of whether new techniques are needed to defend LLMs against these threats.

Imperceptible Content Poisoning in LLM-Powered Applications

Human-Imperceptible Retrieval Poisoning Attacks in LLM-Powered Applications

More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models

Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks

Prompt Injection attack against LLM-integrated Applications

Learning to Poison Large Language Models During Instruction Tuning

Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content

Denial-of-Service Poisoning Attacks against Large Language Models

The Philosopher's Stone: Trojaning Plugins of Large Language Models

The Dark Side of Human Feedback: Poisoning Large Language Models via User Inputs

Forcing Generative Models to Degenerate Ones: The Power of Data Poisoning Attacks

Protecting Your LLMs with Information Bottleneck

Threat Modelling and Risk Analysis for Large Language Model (LLM)-Powered Applications

Large Language Models are Vulnerable to Bait-and-Switch Attacks for Generating Harmful Content

Data Poisoning for In-context Learning

Persistent Pre-Training Poisoning of LLMs

Can LLMs be Fooled? Investigating Vulnerabilities in LLMs

PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning

F2A: An Innovative Approach for Prompt Injection by Utilizing Feign Security Detection Agents