Abstract:State machines play a pivotal role in augmenting the efficacy of protocol analyzing to unveil more vulnerabilities. However, the task of inferring state machines from network protocol implementations presents significant challenges. Traditional methods based on dynamic analysis often overlook crucial state transitions due to limited coverage, while static analysis faces difficulties with complex code structures and behaviors. To address these limitations, we propose an innovative state machine inference approach powered by Large Language Models (LLMs). Utilizing text-embedding technology, this method allows LLMs to dissect and analyze the intricacies of protocol implementation code. Through targeted prompt engineering, we systematically identify and infer the underlying state machines. Our evaluation across six protocol implementations demonstrates the method's high efficacy, achieving an accuracy rate exceeding 90% and successfully delineating differences on state machines among various implementations of the same protocol. Importantly, integrating this approach with protocol fuzzing has notably enhanced AFLNet's code coverage by 10% over RFCNLP, showcasing the considerable potential of LLMs in advancing network protocol security analysis. Our proposed method not only marks a significant step forward in accurate state machine inference but also opens new avenues for improving the security and reliability of protocol implementations.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the challenge of inferring state machines from network protocol implementations. Specifically, the authors focus on how to overcome the limitations of traditional methods in extracting state machines and propose a new method based on large - scale language models (LLMs). #### 1. **Limitations of Traditional Methods** - **Dynamic Analysis**: Traditional methods based on dynamic analysis often overlook some critical state transitions due to limited coverage. - **Static Analysis**: Static analysis methods have difficulty dealing with complex code structures when faced with complex code structures and behaviors, and are prone to the path explosion problem. #### 2. **The Proposed New Method** To solve the above problems, the authors propose a new method for state machine inference using large - scale language models (LLMs). Through text embedding techniques and carefully designed prompts (prompt engineering), this method enables LLMs to parse and analyze the complex details in the protocol implementation code, thereby systematically identifying and inferring the underlying state machine. #### 3. **Specific Objectives** - **Improve Accuracy**: Verified by experiments, the average accuracy of this method in six protocol implementations exceeds 90%, and it can reveal the differences in state machines between different implementations. - **Enhance Protocol Fuzzing**: Applying the inferred state machine to the protocol fuzzing tool AFLNet significantly improves its code coverage, which is 10% higher than RFCNLP. - **Promote Protocol Security Analysis**: This method not only improves the accuracy of state machine inference but also opens up new ways to improve the security and reliability of protocol implementations. #### 4. **Contributions** - **Innovative Method**: For the first time, LLMs are applied to infer state machines in protocol implementations, which can accurately extract protocol states, message types, and their transition relationships. - **Automated Tool**: The PROTOCOL GPT tool has been developed to realize the function of automatically inferring state machines from protocol implementations. - **Experimental Evidence**: A series of experiments have verified the effectiveness of LLMs in extracting state machines from protocol implementations and demonstrated their ability to find differences between different protocol implementations. ### Summary The core problem of this paper is to explore a more efficient and accurate method to infer state machines from network protocol implementations, to overcome the limitations of existing methods, and to provide new tools and technical support for protocol security analysis.

Inferring State Machine from the Protocol Implementation via Large Language Model

Inferring State Machine from the Protocol Implementation Via Large Langeuage Model.

ABInfer: A Novel Field Boundaries Inference Approach for Protocol Reverse Engineering

State Machine Based Malicious Packet Attack Detection and Security Situation Assessment

Extracting Protocol Format as State Machine via Controlled Static Loop Analysis

Automatic State Machine Inference for Binary Protocol Reverse Engineering

Recent Advances in Attack and Defense Approaches of Large Language Models

How Far Have We Gone in Vulnerability Detection Using Large Language Models

Can Large Language Models Help Developers with Robotic Finite State Machine Modification?

AutoAttacker: A Large Language Model Guided System to Implement Automatic Cyber-attacks

A model-guided symbolic execution approach for network protocol implementations and vulnerability detection

A Preliminary Study on Using Large Language Models in Software Pentesting

Large Language Model Supply Chain: Open Problems From the Security Perspective

Investigating Coverage Criteria in Large Language Models: An In-Depth Study Through Jailbreak Attacks

Enhancing Automata Learning with Statistical Machine Learning: A Network Security Case Study

Modeling and Testing of Network Protocols with Parallel State Machines.

Exploring Advanced Methodologies in Security Evaluation for LLMs

Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study

On the Uses of Large Language Models to Interpret Ambiguous Cyberattack Descriptions

Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities

Large Language Models in Cybersecurity: State-of-the-Art