Inferring State Machine from the Protocol Implementation via Large Language Model

Haiyang Wei,Zhengjie Du,Haohui Huang,Yue Liu,Guang Cheng,Linzhang Wang,Bing Mao
2024-06-14
Abstract:State machines play a pivotal role in augmenting the efficacy of protocol analyzing to unveil more vulnerabilities. However, the task of inferring state machines from network protocol implementations presents significant challenges. Traditional methods based on dynamic analysis often overlook crucial state transitions due to limited coverage, while static analysis faces difficulties with complex code structures and behaviors. To address these limitations, we propose an innovative state machine inference approach powered by Large Language Models (LLMs). Utilizing text-embedding technology, this method allows LLMs to dissect and analyze the intricacies of protocol implementation code. Through targeted prompt engineering, we systematically identify and infer the underlying state machines. Our evaluation across six protocol implementations demonstrates the method's high efficacy, achieving an accuracy rate exceeding 90% and successfully delineating differences on state machines among various implementations of the same protocol. Importantly, integrating this approach with protocol fuzzing has notably enhanced AFLNet's code coverage by 10% over RFCNLP, showcasing the considerable potential of LLMs in advancing network protocol security analysis. Our proposed method not only marks a significant step forward in accurate state machine inference but also opens new avenues for improving the security and reliability of protocol implementations.
Cryptography and Security
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the challenge of inferring state machines from network protocol implementations. Specifically, the authors focus on how to overcome the limitations of traditional methods in extracting state machines and propose a new method based on large - scale language models (LLMs). #### 1. **Limitations of Traditional Methods** - **Dynamic Analysis**: Traditional methods based on dynamic analysis often overlook some critical state transitions due to limited coverage. - **Static Analysis**: Static analysis methods have difficulty dealing with complex code structures when faced with complex code structures and behaviors, and are prone to the path explosion problem. #### 2. **The Proposed New Method** To solve the above problems, the authors propose a new method for state machine inference using large - scale language models (LLMs). Through text embedding techniques and carefully designed prompts (prompt engineering), this method enables LLMs to parse and analyze the complex details in the protocol implementation code, thereby systematically identifying and inferring the underlying state machine. #### 3. **Specific Objectives** - **Improve Accuracy**: Verified by experiments, the average accuracy of this method in six protocol implementations exceeds 90%, and it can reveal the differences in state machines between different implementations. - **Enhance Protocol Fuzzing**: Applying the inferred state machine to the protocol fuzzing tool AFLNet significantly improves its code coverage, which is 10% higher than RFCNLP. - **Promote Protocol Security Analysis**: This method not only improves the accuracy of state machine inference but also opens up new ways to improve the security and reliability of protocol implementations. #### 4. **Contributions** - **Innovative Method**: For the first time, LLMs are applied to infer state machines in protocol implementations, which can accurately extract protocol states, message types, and their transition relationships. - **Automated Tool**: The PROTOCOL GPT tool has been developed to realize the function of automatically inferring state machines from protocol implementations. - **Experimental Evidence**: A series of experiments have verified the effectiveness of LLMs in extracting state machines from protocol implementations and demonstrated their ability to find differences between different protocol implementations. ### Summary The core problem of this paper is to explore a more efficient and accurate method to infer state machines from network protocol implementations, to overcome the limitations of existing methods, and to provide new tools and technical support for protocol security analysis.