StruQ: Defending Against Prompt Injection with Structured Queries

Sizhe Chen,Julien Piet,Chawin Sitawarin,David Wagner

2024-09-26

Abstract:Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications, which perform text-based tasks by utilizing their advanced language understanding capabilities. However, as LLMs have improved, so have the attacks against them. Prompt injection attacks are an important threat: they trick the model into deviating from the original application's instructions and instead follow user directives. These attacks rely on the LLM's ability to follow instructions and inability to separate prompts and user data. We introduce structured queries, a general approach to tackle this problem. Structured queries separate prompts and data into two channels. We implement a system that supports structured queries. This system is made of (1) a secure front-end that formats a prompt and user data into a special format, and (2) a specially trained LLM that can produce high-quality outputs from these inputs. The LLM is trained using a novel fine-tuning strategy: we convert a base (non-instruction-tuned) LLM to a structured instruction-tuned model that will only follow instructions in the prompt portion of a query. To do so, we augment standard instruction tuning datasets with examples that also include instructions in the data portion of the query, and fine-tune the model to ignore these. Our system significantly improves resistance to prompt injection attacks, with little or no impact on utility. Our code is released at <a class="link-external link-https" href="https://github.com/Sizhe-Chen/StruQ" rel="external noopener nofollow">this https URL</a>.

Cryptography and Security

What problem does this paper attempt to address?

The problem that this paper attempts to solve is prompt injection attacks faced by large language models (LLMs) in applications. Specifically, when developers use LLMs to perform text tasks, they usually combine instructions (i.e., prompts) and user data into one input and send it to the LLM. This practice has security risks because malicious users can manipulate the behavior of the LLM by inserting specific strings into the user data, making it deviate from the instructions of the original application and instead perform operations specified by the user. This attack takes advantage of the fact that the LLM can parse the instructions in the entire input but cannot distinguish between prompts and user data. To address this challenge, the paper proposes the method of structured queries. Structured queries improve system security by separating prompts and data into two independent parts and specially training the LLM to respond only to the instructions in the prompt part. This method aims to prevent prompt injection attacks while maintaining or approaching the functionality and utility of existing LLMs without significantly increasing training costs.

StruQ: Defending Against Prompt Injection with Structured Queries

Enhancing the Capability and Robustness of Large Language Models through Reinforcement Learning-Driven Query Refinement

Signed-Prompt: A New Approach to Prevent Prompt Injection Attacks Against LLM-Integrated Applications

Defense Against Prompt Injection Attack by Leveraging Attack Techniques

Prompt Injection attack against LLM-integrated Applications

Fine-tuned Large Language Models (LLMs): Improved Prompt Injection Attacks Detection

Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

Systematically Analyzing Prompt Injection Vulnerabilities in Diverse LLM Architectures

More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models

Aligning LLMs to Be Robust Against Prompt Injection

Automatic and Universal Prompt Injection Attacks against Large Language Models

SPML: A DSL for Defending Language Models Against Prompt Attacks

From Prompt Injections to SQL Injection Attacks: How Protected is Your LLM-Integrated Web Application?

SoK: Prompt Hacking of Large Language Models

Evaluating the Instruction-Following Robustness of Large Language Models to Prompt Injection

Imprompter: Tricking LLM Agents into Improper Tool Use

Soft Begging: Modular and Efficient Shielding of LLMs against Prompt Injection and Jailbreaking based on Prompt Tuning

Formalizing and Benchmarking Prompt Injection Attacks and Defenses

SQL Injection Jailbreak: a structural disaster of large language models

Defending Against Indirect Prompt Injection Attacks With Spotlighting

Defensive Prompt Patch: A Robust and Interpretable Defense of LLMs against Jailbreak Attacks