Abstract:Large Language Models (LLMs) are increasingly being integrated into various applications. The functionalities of recent LLMs can be flexibly modulated via natural language prompts. This renders them susceptible to targeted adversarial prompting, e.g., Prompt Injection (PI) attacks enable attackers to override original instructions and employed controls. So far, it was assumed that the user is directly prompting the LLM. But, what if it is not the user prompting? We argue that LLM-Integrated Applications blur the line between data and instructions. We reveal new attack vectors, using Indirect Prompt Injection, that enable adversaries to remotely (without a direct interface) exploit LLM-integrated applications by strategically injecting prompts into data likely to be retrieved. We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities, including data theft, worming, information ecosystem contamination, and other novel security risks. We demonstrate our attacks' practical viability against both real-world systems, such as Bing's GPT-4 powered Chat and code-completion engines, and synthetic applications built on GPT-4. We show how processing retrieved prompts can act as arbitrary code execution, manipulate the application's functionality, and control how and if other APIs are called. Despite the increasing integration and reliance on LLMs, effective mitigations of these emerging threats are currently lacking. By raising awareness of these vulnerabilities and providing key insights into their implications, we aim to promote the safe and responsible deployment of these powerful models and the development of robust defenses that protect users and systems from potential attacks.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the security threats that emerge in the integrated applications of large - language models (LLMs), especially Indirect Prompt Injection (IPI) attacks. Specifically: 1. **Blurring the Boundary between Data and Instructions**: The paper points out that when LLMs are used in combination with retrieval functions, the boundary between data and instructions becomes blurred. This means that an attacker can indirectly control the behavior of an LLM by injecting malicious prompts into the data that may be retrieved, even without direct access rights. 2. **New Attack Vectors**: Traditional attacks usually assume that users directly manipulate LLMs through natural - language prompts. However, the paper reveals a new attack method - Indirect Prompt Injection, that is, an attacker can remotely (without a direct interface) strategically inject prompts into the data that may be retrieved, thereby exploiting LLM - integrated applications. 3. **Systematic Threat Analysis**: From the perspective of computer security, the paper develops a comprehensive taxonomy and systematically studies the impacts and vulnerabilities brought by Indirect Prompt Injection, including data theft, worm propagation, information ecosystem pollution, and other new security risks. 4. **Verification of Practical Feasibility**: The paper not only proposes the theoretical possibility of attacks but also proves the practical feasibility of these attacks in real - world systems (such as Bing's GPT - 4 chat and code completion engines) and synthetic applications through experiments. 5. **Lack of Effective Mitigation Measures**: Although the integration and dependence of LLMs are continuously increasing, effective mitigation measures for these emerging threats are still insufficient at present. The paper aims to promote the safe and responsible deployment of these powerful models and develop robust defense mechanisms to protect users and systems by raising awareness of these vulnerabilities and providing key insights. In summary, the main contributions of this paper lie in introducing the concept of Indirect Prompt Injection, developing the relevant threat taxonomy, demonstrating the practical feasibility of these attacks, and emphasizing the need to establish more powerful defense measures to deal with these security threats.

Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models

Prompt Injection attack against LLM-integrated Applications

SoK: Prompt Hacking of Large Language Models

Systematically Analyzing Prompt Injection Vulnerabilities in Diverse LLM Architectures

From Prompt Injections to SQL Injection Attacks: How Protected is Your LLM-Integrated Web Application?

Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks

Imprompter: Tricking LLM Agents into Improper Tool Use

The Ethics of Interaction: Mitigating Security Threats in LLMs

Automatic and Universal Prompt Injection Attacks against Large Language Models

MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants

Signed-Prompt: A New Approach to Prevent Prompt Injection Attacks Against LLM-Integrated Applications

Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems

An Early Categorization of Prompt Injection Attacks on Large Language Models

A Study on Prompt Injection Attack Against LLM-Integrated Mobile Robotic Systems

Human-Interpretable Adversarial Prompt Attack on Large Language Models with Situational Context

An LLM can Fool Itself: A Prompt-Based Adversarial Attack

SPML: A DSL for Defending Language Models Against Prompt Attacks

Prompt Leakage effect and defense strategies for multi-turn LLM interactions

Defending Against Indirect Prompt Injection Attacks With Spotlighting

Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models