AttacKG+:Boosting Attack Knowledge Graph Construction with Large Language Models

Yongheng Zhang,Tingwen Du,Yunshan Ma,Xiang Wang,Yi Xie,Guozheng Yang,Yuliang Lu,Ee-Chien Chang
2024-05-08
Abstract:Attack knowledge graph construction seeks to convert textual cyber threat intelligence (CTI) reports into structured representations, portraying the evolutionary traces of cyber attacks. Even though previous research has proposed various methods to construct attack knowledge graphs, they generally suffer from limited generalization capability to diverse knowledge types as well as requirement of expertise in model design and tuning. Addressing these limitations, we seek to utilize Large Language Models (LLMs), which have achieved enormous success in a broad range of tasks given exceptional capabilities in both language understanding and zero-shot task fulfillment. Thus, we propose a fully automatic LLM-based framework to construct attack knowledge graphs named: AttacKG+. Our framework consists of four consecutive modules: rewriter, parser, identifier, and summarizer, each of which is implemented by instruction prompting and in-context learning empowered by LLMs. Furthermore, we upgrade the existing attack knowledge schema and propose a comprehensive version. We represent a cyber attack as a temporally unfolding event, each temporal step of which encapsulates three layers of representation, including behavior graph, MITRE TTP labels, and state summary. Extensive evaluation demonstrates that: 1) our formulation seamlessly satisfies the information needs in threat event analysis, 2) our construction framework is effective in faithfully and accurately extracting the information defined by AttacKG+, and 3) our attack graph directly benefits downstream security practices such as attack reconstruction. All the code and datasets will be released upon acceptance.
Cryptography and Security,Artificial Intelligence
What problem does this paper attempt to address?
### The Problem Addressed by This Paper This paper aims to address the problem of constructing attack knowledge graphs in Cyber Threat Intelligence (CTI) reports. Specifically: 1. **Limitations of Existing Methods**: - **Limited Semantic Understanding**: Existing models have limitations in handling diverse attack scenarios and types of knowledge. They often have small training datasets and relatively small model sizes, making it difficult to cope with various types of security knowledge in open scenarios. - **Strong Dependency on Model Design**: Current methods require specially designed natural language processing or graph matching models and a significant amount of human effort for fine-tuning. This poses a challenge for security technicians who lack relevant background knowledge. 2. **Leveraging the Advantages of Large Language Models (LLMs)**: - Large language models use large-scale open knowledge data during pre-training, possessing strong contextual understanding and knowledge reasoning capabilities, and can understand various types of knowledge across different domains. - LLMs can perform zero-shot and few-shot tasks through instruction following and contextual learning without the need for special model structure design or specific dataset training. Therefore, using LLMs to construct attack knowledge graphs can effectively address the above two limitations. 3. **Proposed New Framework AttacKG+**: - This paper proposes a fully automated LLM-based framework called AttacKG+, which includes four modules: Rewriter, Parser, Identifier, and Summarizer. - The framework converts CTI reports into structured attack knowledge graphs through steps of rewriting, parsing, identifying, and summarizing, with each module leveraging the capabilities of LLMs. Through the above methods, the paper aims to improve the accuracy and generalization ability of constructing attack knowledge graphs while simplifying the model design process, making it more user-friendly.