Abstract:This study investigates the capabilities of Large Language Models (LLMs), specifically GPT-4, in the context of Binary Reverse Engineering (RE). Employing a structured experimental approach, we analyzed the LLM's performance in interpreting and explaining human-written and decompiled codes. The research encompassed two phases: the first on basic code interpretation and the second on more complex malware analysis. Key findings indicate LLMs' proficiency in general code understanding, with varying effectiveness in detailed technical and security analyses. The study underscores the potential and current limitations of LLMs in reverse engineering, revealing crucial insights for future applications and improvements. Also, we examined our experimental methodologies, such as methods of evaluation and data constraints, which provided us with a technical vision for any future research activity in this field.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to evaluate the effectiveness of large - language models (LLMs), especially GPT - 4, in binary reverse engineering (RE). Specifically, the research aims to explore the following questions: 1. **Code - interpretation ability**: Can LLMs effectively interpret and understand human - written code and decompiled code? This includes the interpretation of simple code and complex malware code. 2. **Malware analysis**: How do LLMs perform when processing and analyzing malware? This includes identifying the functions of malware, network behavior, potential security vulnerabilities, etc. 3. **Technical limitations and improvement directions**: What are the current limitations of LLMs in binary reverse engineering? How can their performance be improved by improving the model or method? ### Research background and motivation With the development of large - language models (LLMs), their performance in natural - language - processing (NLP) tasks has become increasingly excellent, especially in understanding and generating context - related texts. However, the application potential of these models in the field of computer science, especially in binary reverse engineering, has not been fully explored. Binary reverse engineering is a complex and time - consuming task, involving decoding the underlying code and design logic of binary files. Therefore, researchers have a strong interest in the performance of LLMs in this field, hoping to simplify and accelerate the reverse - engineering process through these models. ### Research objectives To answer the above questions, the research has set two main objectives: 1. **Explore the performance of LLMs in basic code interpretation**: Through a series of experiments, evaluate the ability of LLMs to interpret simple C - program code and decompiled code. 2. **Evaluate the performance of LLMs in malware analysis**: Through more complex experiments, test the effectiveness of LLMs in interpreting and analyzing real - world malware code. ### Experimental design The research is carried out in two stages: 1. **First stage: Basic code interpretation** - **Scenario 1**: Interpret the original code (with comments) - **Scenario 2**: Interpret the code without comments - **Scenario 3**: Interpret the decompiled code 2. **Second stage: Malware analysis** - **Scenario 1 & 2**: Rename the decompiled functions and variables - **Scenario 3**: Evaluate code properties through binary questions - **Scenario 4**: Conduct a comprehensive analysis from three perspectives: function overview, key observations, and security analysis - **Scenario 5**: Conduct in - depth analysis of code structure and function through questionnaires ### Key findings 1. **LLMs show certain abilities in basic code interpretation**, but have limitations in decompiled - code interpretation. 2. **In malware analysis**, LLMs can identify some basic code structures and functions, but perform poorly in detecting covert techniques and certain malicious activities. 3. **The research reveals the potential and limitations of LLMs in reverse engineering**, providing valuable directions for future research and improvement. Through these experiments, researchers hope to provide theoretical basis and technical support for applying LLMs to binary reverse engineering, and lay the foundation for further research and development.

Exploring the Efficacy of Large Language Models (GPT-4) in Binary Reverse Engineering

LLM4Decompile: Decompiling Binary Code with Large Language Models

Can Large Language Models Automatically Jailbreak GPT-4V?

How Well Do Large Language Models Serve as End-to-End Secure Code Producers?

Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities

Can Large Language Models Find And Fix Vulnerable Software?

Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries

How Far Have We Gone in Binary Code Understanding Using Large Language Models

Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve

What You See Is Not Always What You Get: An Empirical Study of Code Comprehension by Large Language Models

Large Language Models for Secure Code Assessment: A Multi-Language Empirical Study

An Insight into Security Code Review with LLMs: Capabilities, Obstacles and Influential Factors

Figure it Out: Analyzing-based Jailbreak Attack on Large Language Models

An Empirical Study on Information Extraction using Large Language Models

How Far Have We Gone in Stripped Binary Code Understanding Using Large Language Models

Exploring and Characterizing Large Language Models For Embedded System Development and Debugging

Large Language Model for Vulnerability Detection: Emerging Results and Future Directions

The Use of Large Language Models (LLM) for Cyber Threat Intelligence (CTI) in Cybercrime Forums

From Text to MITRE Techniques: Exploring the Malicious Use of Large Language Models for Generating Cyber Attack Payloads