garak: A Framework for Security Probing Large Language Models

Leon Derczynski,Erick Galinkin,Jeffrey Martin,Subho Majumdar,Nanna Inie

2024-06-17

Abstract:As Large Language Models (LLMs) are deployed and integrated into thousands of applications, the need for scalable evaluation of how models respond to adversarial attacks grows rapidly. However, LLM security is a moving target: models produce unpredictable output, are constantly updated, and the potential adversary is highly diverse: anyone with access to the internet and a decent command of natural language. Further, what constitutes a security weak in one context may not be an issue in a different context; one-fits-all guardrails remain theoretical. In this paper, we argue that it is time to rethink what constitutes ``LLM security'', and pursue a holistic approach to LLM security evaluation, where exploration and discovery of issues are central. To this end, this paper introduces garak (Generative AI Red-teaming and Assessment Kit), a framework which can be used to discover and identify vulnerabilities in a target LLM or dialog system. garak probes an LLM in a structured fashion to discover potential vulnerabilities. The outputs of the framework describe a target model's weaknesses, contribute to an informed discussion of what composes vulnerabilities in unique contexts, and can inform alignment and policy discussions for LLM deployment.

Computation and Language,Cryptography and Security

What problem does this paper attempt to address?

The paper primarily focuses on the issue of security assessment for large language models (LLMs), especially as the demand for their security evaluation grows with their widespread deployment in various applications. The paper points out that traditional security assessment methods struggle to cope with the evolving characteristics of LLMs and the diverse potential threats they face. Therefore, the authors propose a new framework called garak (Generative AI Red-teaming and Assessment Kit). The garak framework aims to conduct security audits of LLMs in a structured manner, promoting the exploration and discovery of security issues. Specifically, the framework includes the following key components: 1. **Generators**: Any object or system responsible for generating text. 2. **Probes**: A series of tests designed to elicit specific types of vulnerabilities from the target LLM. 3. **Detectors**: Tools used to automatically identify failure patterns in the model's responses. 4. **Buffs**: Modifications to the interaction between probes and generators to reveal more potential issues. Through the collaborative work of these components, garak can test for different security issues and provide detailed reports on the weaknesses of the target model. Additionally, garak supports attack generation capabilities, allowing it to adaptively generate new test cases based on previous successful attempts. Overall, the goal of the paper is to promote the security assessment of LLMs by proposing a new, flexible, and scalable framework, thereby helping researchers and developers better understand and address the security challenges of these complex systems.

garak: A Framework for Security Probing Large Language Models

Large language models in 6G security: challenges and opportunities

Adversarial attacks and defenses for large language models (LLMs): methods, frameworks & challenges

A New Era in LLM Security: Exploring Security Concerns in Real-World LLM-based Systems

Exploring Vulnerabilities and Protections in Large Language Models: A Survey

Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks

Adversarial Attacks and Defenses in Large Language Models: Old and New Threats

CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models

Can We Trust Large Language Models Generated Code? A Framework for In-Context Learning, Security Patterns, and Code Evaluations Across Diverse LLMs

ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming

Threat Modelling and Risk Analysis for Large Language Model (LLM)-Powered Applications

A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly

AutoAttacker: A Large Language Model Guided System to Implement Automatic Cyber-attacks

Large Language Model Supply Chain: Open Problems From the Security Perspective

A Comprehensive Survey of Attack Techniques, Implementation, and Mitigation Strategies in Large Language Models

Insights and Current Gaps in Open-Source LLM Vulnerability Scanners: A Comparative Analysis

LLMs Killed the Script Kiddie: How Agents Supported by Large Language Models Change the Landscape of Network Threat Testing

Exploring the Adversarial Capabilities of Large Language Models

Large Language Models and Security