Abstract:The swift advancement of large language models (LLMs) has profoundly shaped the landscape of artificial intelligence; however, their deployment in sensitive domains raises grave concerns, particularly due to their susceptibility to malicious exploitation. This situation underscores the insufficiencies in pre-deployment testing, highlighting the urgent need for more rigorous and comprehensive evaluation methods. This study presents a comprehensive empirical analysis assessing the efficacy of conventional coverage criteria in identifying these vulnerabilities, with a particular emphasis on the pressing issue of jailbreak attacks. Our investigation begins with a clustering analysis of the hidden states in LLMs, demonstrating that intrinsic characteristics of these states can distinctly differentiate between various types of queries. Subsequently, we assess the performance of these criteria across three critical dimensions: criterion level, layer level, and token level. Our findings uncover significant disparities in neuron activation patterns between the processing of normal and jailbreak queries, thereby corroborating the clustering results. Leveraging these findings, we propose an innovative approach for the real-time detection of jailbreak attacks by utilizing neural activation features. Our classifier demonstrates remarkable accuracy, averaging 96.33% in identifying jailbreak queries, including those that could lead to adversarial attacks. The importance of our research lies in its comprehensive approach to addressing the intricate challenges of LLM security. By enabling instantaneous detection from the model's first token output, our method holds promise for future systems integrating LLMs, offering robust real-time detection capabilities. This study advances our understanding of LLM security testing, and lays a critical foundation for the development of more resilient AI systems.

Inferring State Machine from the Protocol Implementation Via Large Langeuage Model.

Inferring State Machine from the Protocol Implementation via Large Language Model

ABInfer: A Novel Field Boundaries Inference Approach for Protocol Reverse Engineering

State Machine Based Malicious Packet Attack Detection and Security Situation Assessment

Extracting Protocol Format as State Machine via Controlled Static Loop Analysis

Automatic State Machine Inference for Binary Protocol Reverse Engineering

Recent Advances in Attack and Defense Approaches of Large Language Models

How Far Have We Gone in Vulnerability Detection Using Large Language Models

AutoAttacker: A Large Language Model Guided System to Implement Automatic Cyber-attacks

Can Large Language Models Help Developers with Robotic Finite State Machine Modification?

A model-guided symbolic execution approach for network protocol implementations and vulnerability detection

A Preliminary Study on Using Large Language Models in Software Pentesting

Exploring Advanced Methodologies in Security Evaluation for LLMs

Large Language Model Supply Chain: Open Problems From the Security Perspective

Investigating Coverage Criteria in Large Language Models: An In-Depth Study Through Jailbreak Attacks

Harnessing the Power of LLM to Support Binary Taint Analysis

Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study

Enhancing Automata Learning with Statistical Machine Learning: A Network Security Case Study

States Hidden in Hidden States: LLMs Emerge Discrete State Representations Implicitly

Large Language Models for Code Analysis: Do LLMs Really Do Their Job?

ThreatModeling-LLM: Automating Threat Modeling using Large Language Models for Banking System