Abstract:Large Language Model (LLM) systems are inherently compositional, with individual LLM serving as the core foundation with additional layers of objects such as plugins, sandbox, and so on. Along with the great potential, there are also increasing concerns over the security of such probabilistic intelligent systems. However, existing studies on LLM security often focus on individual LLM, but without examining the ecosystem through the lens of LLM systems with other objects (e.g., Frontend, Webtool, Sandbox, and so on). In this paper, we systematically analyze the security of LLM systems, instead of focusing on the individual LLMs. To do so, we build on top of the information flow and formulate the security of LLM systems as constraints on the alignment of the information flow within LLM and between LLM and other objects. Based on this construction and the unique probabilistic nature of LLM, the attack surface of the LLM system can be decomposed into three key components: (1) multi-layer security analysis, (2) analysis of the existence of constraints, and (3) analysis of the robustness of these constraints. To ground this new attack surface, we propose a multi-layer and multi-step approach and apply it to the state-of-art LLM system, OpenAI GPT4. Our investigation exposes several security issues, not just within the LLM model itself but also in its integration with other components. We found that although the OpenAI GPT4 has designed numerous safety constraints to improve its safety features, these safety constraints are still vulnerable to attackers. To further demonstrate the real-world threats of our discovered vulnerabilities, we construct an end-to-end attack where an adversary can illicitly acquire the user's chat history, all without the need to manipulate the user's input or gain direct access to OpenAI GPT4. Our demo is in the link: https://fzwark.github.io/LLM-System-Attack-Demo/

Is osteoporosis a pediatric disease? Peak bone mass attainment in the adolescent female.

Red Teaming Language Model Detectors with Language Models

DualFlow: Generating imperceptible adversarial examples by flow field and normalize flow-based model

Learning diverse attacks on large language models for robust red-teaming and safety tuning

Arondight: Red Teaming Large Vision Language Models with Auto-generated Multi-modal Jailbreak Prompts

FMM-Attack: A Flow-based Multi-modal Adversarial Attack on Video-based LLMs

L-AutoDA: Leveraging Large Language Models for Automated Decision-based Adversarial Attacks

Defending Large Language Models Against Attacks With Residual Stream Activation Analysis

Exploring the Adversarial Capabilities of Large Language Models

Visual Adversarial Examples Jailbreak Aligned Large Language Models

Misusing Tools in Large Language Models With Visual Adversarial Examples

Large Language Model Sentinel: Advancing Adversarial Robustness by LLM Agent

Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks

Large Language Model Sentinel: LLM Agent for Adversarial Purification

Purple-teaming LLMs with Adversarial Defender Training

Detecting AI Flaws: Target-Driven Attacks on Internal Faults in Language Models

SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models

Targeting the Core: A Simple and Effective Method to Attack RAG-based Agents via Direct LLM Manipulation

A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents

Recent advancements in LLM Red-Teaming: Techniques, Defenses, and Ethical Considerations

A New Era in LLM Security: Exploring Security Concerns in Real-World LLM-based Systems