Prompt Injection attack against LLM-integrated Applications

Yi Liu,Gelei Deng,Yuekang Li,Kailong Wang,Zihao Wang,Xiaofeng Wang,Tianwei Zhang,Yepang Liu,Haoyu Wang,Yan Zheng,Yang Liu
2024-03-02
Abstract:Large Language Models (LLMs), renowned for their superior proficiency in language comprehension and generation, stimulate a vibrant ecosystem of applications around them. However, their extensive assimilation into various services introduces significant security risks. This study deconstructs the complexities and implications of prompt injection attacks on actual LLM-integrated applications. Initially, we conduct an exploratory analysis on ten commercial applications, highlighting the constraints of current attack strategies in practice. Prompted by these limitations, we subsequently formulate HouYi, a novel black-box prompt injection attack technique, which draws inspiration from traditional web injection attacks. HouYi is compartmentalized into three crucial elements: a seamlessly-incorporated pre-constructed prompt, an injection prompt inducing context partition, and a malicious payload designed to fulfill the attack objectives. Leveraging HouYi, we unveil previously unknown and severe attack outcomes, such as unrestricted arbitrary LLM usage and uncomplicated application prompt theft. We deploy HouYi on 36 actual LLM-integrated applications and discern 31 applications susceptible to prompt injection. 10 vendors have validated our discoveries, including Notion, which has the potential to impact millions of users. Our investigation illuminates both the possible risks of prompt injection attacks and the possible tactics for mitigation.
Cryptography and Security,Artificial Intelligence,Computation and Language,Software Engineering
What problem does this paper attempt to address?
The paper attempts to address the security risks posed by Prompt Injection Attacks in large language model (LLM) integrated applications. Specifically: 1. **Limitations of Existing Attack Methods**: Existing prompt injection attack methods are limited in their effectiveness in practical applications because different applications interpret prompts differently, and many applications have strict input and output format requirements, which restrict the effectiveness of traditional attack methods. 2. **Development of New Attack Techniques**: To overcome these limitations, the paper proposes a new black-box prompt injection attack technique called HOUYI. HOUYI achieves its attack goals through three key components (pre-built prompts, context-separated prompts, and malicious payloads), which can effectively bypass existing defense mechanisms. 3. **Validation in Practical Applications**: The paper deployed HOUYI in 36 real-world LLM integrated applications and found that 31 applications had prompt injection vulnerabilities. Additionally, 10 vendors have confirmed these findings, including Notion, which could potentially affect millions of users. In summary, the paper aims to reveal the risks of prompt injection attacks and proposes a new attack method to promote the development of stronger defense measures.