Using LLMs to Automate Threat Intelligence Analysis Workflows in Security Operation Centers

PeiYu Tseng,ZihDwo Yeh,Xushu Dai,Peng Liu
2024-07-18
Abstract:SIEM systems are prevalent and play a critical role in a variety of analyst workflows in Security Operation Centers. However, modern SIEMs face a big challenge: they still cannot relieve analysts from the repetitive tasks involved in analyzing CTI (Cyber Threat Intelligence) reports written in natural languages. This project aims to develop an AI agent to replace the labor intensive repetitive tasks involved in analyzing CTI reports. The agent exploits the revolutionary capabilities of LLMs (e.g., GPT-4), but it does not require any human intervention.
Cryptography and Security
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: Modern security information and event management (SIEM) systems are still unable to free security analysts from the repetitive tasks in analyzing cyber - threat intelligence (CTI) reports. Specifically, CTI reports are usually published in natural language, which requires security analysts to spend a great deal of time reading and analyzing these reports, thus increasing the response time to attacks. ### Core of the Problem 1. **Repetitive Tasks**: Security analysts need to manually extract important information from a large number of CTI reports and convert it into rules (such as regular expressions) that can be used in SIEM systems. This process is very time - consuming and error - prone. 2. **Lack of Automation**: Although existing machine - learning techniques can automatically extract some information fragments, these models perform poorly in generalization ability and cannot handle new or complex threat intelligence. ### Solutions To solve the above problems, the paper proposes to develop an artificial intelligence agent (AI agent) based on large - language models (LLM), which is able to: - **Automatically Extract Important Information**: Automatically identify and extract key threat indicators (IOCs) from CTI reports, such as file names, command lines, registry keys, etc. - **Generate Regular Expressions (Regex)**: Convert the extracted IOCs into regular expressions suitable for SIEM systems for more complex pattern matching. - **Construct Relationship Diagrams**: Analyze the dependencies between different IOCs and generate relationship diagrams to help analysts better understand attack models. - **Without Manual Intervention**: The entire process is fully automated and does not require human participation, thereby significantly improving efficiency and reducing human errors. ### Technical Challenges and Solutions 1. **Factual Errors**: LLM may produce factual errors, so a voting mechanism and retrieval - augmented filtering are required to purify the output of LLM. 2. **Generation of Regular Expressions**: LLM needs to distinguish between capturing groups and non - capturing groups to generate correct regular expressions. For this purpose, the paper proposes a retrieval - augmented matching mechanism. 3. **Identification of Dependencies**: In order to accurately identify the dependencies between IOCs, the paper uses LLM to re - parse the original paragraphs and standardize these relationships through verb classification and mapping. 4. **Verification Mechanism**: Verify the dependencies in the relationship diagrams through predefined rules to ensure their accuracy. ### Summary The main contribution of this paper lies in the development of a brand - new AI agent, which utilizes the powerful capabilities of LLM to achieve a high degree of automation in the CTI analysis process, thereby greatly reducing the workload of security analysts and improving work efficiency and accuracy.