Abstract:This paper presents an approach to developing assurance cases for adversarial robustness and regulatory compliance in large language models (LLMs). Focusing on both natural and code language tasks, we explore the vulnerabilities these models face, including adversarial attacks based on jailbreaking, heuristics, and randomization. We propose a layered framework incorporating guardrails at various stages of LLM deployment, aimed at mitigating these attacks and ensuring compliance with the EU AI Act. Our approach includes a meta-layer for dynamic risk management and reasoning, crucial for addressing the evolving nature of LLM vulnerabilities. We illustrate our method with two exemplary assurance cases, highlighting how different contexts demand tailored strategies to ensure robust and compliant AI systems.

What problem does this paper attempt to address?

The main problems that this paper attempts to solve are the robustness and regulatory compliance issues of large - language models (LLMs) in the face of adversarial attacks. Specifically, the paper focuses on the following aspects: 1. **Vulnerability to adversarial attacks**: LLMs are vulnerable to various adversarial attacks, including but not limited to jailbreak attacks, heuristic - based optimization, and randomization techniques. These attacks may cause the model to generate harmful outputs, thereby causing harm to users or downstream systems. 2. **Regulatory compliance**: With the introduction of regulations such as the EU AI Act, LLMs developers and deployers need to ensure that their systems meet relevant regulatory requirements, especially those regarding adversarial robustness. The Act requires developers to protect the system from adversarial attacks and report serious incidents. 3. **Dynamic risk management**: Since the vulnerabilities of LLMs are dynamic and closely related to the application environment, a method that can continuously monitor and respond to newly emerging attacks is required. This involves how to evaluate and update protection measures to deal with constantly changing threats. To solve these problems, the paper proposes a hierarchical framework. By setting guardrails at different stages, it aims to mitigate adversarial attacks and ensure regulatory compliance. In addition, the paper introduces a meta - layer for dynamic risk management and reasoning to deal with the evolving nature of LLMs' vulnerabilities. Finally, the paper shows how to achieve robustness and compliance in different contexts through two example assurance cases. In summary, the goal of this paper is to develop a systematic assurance - case method to ensure that LLMs are both robust and compliant with regulations when facing adversarial attacks.

Developing Assurance Cases for Adversarial Robustness and Regulatory Compliance in LLMs

Towards Assuring EU AI Act Compliance and Adversarial Robustness of LLMs

Towards Assurance of LLM Adversarial Robustness using Ontology-Driven Argumentation

Knowledge-Augmented Reasoning for EUAIA Compliance and Adversarial Robustness of LLMs

Current state of LLM Risks and AI Guardrails

ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming

Do LLMs Have Political Correctness? Analyzing Ethical Biases and Jailbreak Vulnerabilities in AI Systems

Rethinking Legal Compliance Automation: Opportunities with Large Language Models

Assessing Hidden Risks of LLMs: An Empirical Study on Robustness, Consistency, and Credibility

Enhancing Adversarial Resistance in LLMs with Recursion

Exploring the Adversarial Capabilities of Large Language Models

SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks

The Ethics of Interaction: Mitigating Security Threats in LLMs

Fortifying Ethical Boundaries in AI: Advanced Strategies for Enhancing Security in Large Language Models

Breaking the Silence: the Threats of Using LLMs in Software Engineering

"I Always Felt that Something Was Wrong.": Understanding Compliance Risks and Mitigation Strategies when Professionals Use Large Language Models

Threat Modelling and Risk Analysis for Large Language Model (LLM)-Powered Applications

Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities

Supporting Human-AI Collaboration in Auditing LLMs with LLMs

Trustworthy AI: Securing Sensitive Data in Large Language Models

GUARD-D-LLM: An LLM-Based Risk Assessment Engine for the Downstream uses of LLMs