Developing Assurance Cases for Adversarial Robustness and Regulatory Compliance in LLMs

Tomas Bueno Momcilovic,Dian Balta,Beat Buesser,Giulio Zizzo,Mark Purcell
2024-10-05
Abstract:This paper presents an approach to developing assurance cases for adversarial robustness and regulatory compliance in large language models (LLMs). Focusing on both natural and code language tasks, we explore the vulnerabilities these models face, including adversarial attacks based on jailbreaking, heuristics, and randomization. We propose a layered framework incorporating guardrails at various stages of LLM deployment, aimed at mitigating these attacks and ensuring compliance with the EU AI Act. Our approach includes a meta-layer for dynamic risk management and reasoning, crucial for addressing the evolving nature of LLM vulnerabilities. We illustrate our method with two exemplary assurance cases, highlighting how different contexts demand tailored strategies to ensure robust and compliant AI systems.
Cryptography and Security,Artificial Intelligence,Software Engineering
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are the robustness and regulatory compliance issues of large - language models (LLMs) in the face of adversarial attacks. Specifically, the paper focuses on the following aspects: 1. **Vulnerability to adversarial attacks**: LLMs are vulnerable to various adversarial attacks, including but not limited to jailbreak attacks, heuristic - based optimization, and randomization techniques. These attacks may cause the model to generate harmful outputs, thereby causing harm to users or downstream systems. 2. **Regulatory compliance**: With the introduction of regulations such as the EU AI Act, LLMs developers and deployers need to ensure that their systems meet relevant regulatory requirements, especially those regarding adversarial robustness. The Act requires developers to protect the system from adversarial attacks and report serious incidents. 3. **Dynamic risk management**: Since the vulnerabilities of LLMs are dynamic and closely related to the application environment, a method that can continuously monitor and respond to newly emerging attacks is required. This involves how to evaluate and update protection measures to deal with constantly changing threats. To solve these problems, the paper proposes a hierarchical framework. By setting guardrails at different stages, it aims to mitigate adversarial attacks and ensure regulatory compliance. In addition, the paper introduces a meta - layer for dynamic risk management and reasoning to deal with the evolving nature of LLMs' vulnerabilities. Finally, the paper shows how to achieve robustness and compliance in different contexts through two example assurance cases. In summary, the goal of this paper is to develop a systematic assurance - case method to ensure that LLMs are both robust and compliant with regulations when facing adversarial attacks.