Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)

Apurv Verma,Satyapriya Krishna,Sebastian Gehrmann,Madhavan Seshadri,Anu Pradhan,Tom Ault,Leslie Barrett,David Rabinowitz,John Doucette,NhatHai Phan
2024-07-21
Abstract:Creating secure and resilient applications with large language models (LLM) requires anticipating, adjusting to, and countering unforeseen threats. Red-teaming has emerged as a critical technique for identifying vulnerabilities in real-world LLM implementations. This paper presents a detailed threat model and provides a systematization of knowledge (SoK) of red-teaming attacks on LLMs. We develop a taxonomy of attacks based on the stages of the LLM development and deployment process and extract various insights from previous research. In addition, we compile methods for defense and practical red-teaming strategies for practitioners. By delineating prominent attack motifs and shedding light on various entry points, this paper provides a framework for improving the security and robustness of LLM-based systems.
Computation and Language,Cryptography and Security
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to systematically construct threat models for large language models (LLMs) and provide a methodology for red - teaming to identify and mitigate the potential risks of LLMs in practical applications. Specifically, the objectives of the paper include: 1. **Introduce threat models based on the entry points of the LLM development and deployment life cycle**: This allows reasoning about various types of attacks and their corresponding defense measures. 2. **Provide an attack taxonomy based on the proposed threat models**: Subsequently, briefly discuss common defense methodologies. 3. **Systematically organize various insights drawn from previously published work**: To distill the ideal properties required for effective red - teaming and ensuring robust defense strategies. Through these contributions, the paper aims to help researchers and practitioners address the complex challenges in LLM applications, especially in the development of helpful, harmless, and honest LLMs (H3LLM). The paper also emphasizes the importance of red - teaming in evaluating the security of LLMs and minimizing the risks of their deployment in human - facing products.