MultiKG: Multi-Source Threat Intelligence Aggregation for High-Quality Knowledge Graph Representation of Attack Techniques

Jian Wang,Tiantian Zhu,Chunlin Xiong,Yan Chen
2024-11-13
Abstract:The construction of attack technique knowledge graphs aims to transform various types of attack knowledge into structured representations for more effective attack procedure modeling. Existing methods typically rely on textual data, such as Cyber Threat Intelligence (CTI) reports, which are often coarse-grained and unstructured, resulting in incomplete and inaccurate knowledge graphs. To address these issues, we expand attack knowledge sources by incorporating audit logs and static code analysis alongside CTI reports, providing finer-grained data for constructing attack technique knowledge graphs. We propose MultiKG, a fully automated framework that integrates multiple threat knowledge sources. MultiKG processes data from CTI reports, dynamic logs, and static code separately, then merges them into a unified attack knowledge graph. Through system design and the utilization of the Large Language Model (LLM), MultiKG automates the analysis, construction, and merging of attack graphs across these sources, producing a fine-grained, multi-source attack knowledge graph. We implemented MultiKG and evaluated it using 1,015 real attack techniques and 9,006 attack intelligence entries from CTI reports. Results show that MultiKG effectively extracts attack knowledge graphs from diverse sources and aggregates them into accurate, comprehensive representations. Through case studies, we demonstrate that our approach directly benefits security tasks such as attack reconstruction and detection.
Cryptography and Security
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the limitations in the existing construction methods of attack technology knowledge graphs. Specifically, these problems include: 1. **Dispersed and Incomplete Attack Knowledge**: Existing attack knowledge graphs usually rely on a single information source (such as cyber - threat intelligence reports), and these information sources are often coarse - grained and unstructured, resulting in incomplete and inaccurate knowledge graphs. 2. **Lack of Fine - Grained Information**: Existing methods fail to fully utilize fine - grained information sources such as audit logs and static code analysis, and thus cannot accurately represent the specific details of attack behaviors. 3. **Difficulty in Integrating Multi - Source Information**: How to effectively collect and summarize attack information from different sources (such as threat intelligence, dynamic logs, static code) and fuse this information into a unified and complete attack knowledge graph is a challenge. To solve the above problems, the paper proposes a framework named MultiKG. This framework expands the sources of attack knowledge by introducing audit logs and static code analysis, thereby providing more fine - grained knowledge for constructing attack technology knowledge graphs. MultiKG realizes the automatic processing of cross - source threat intelligence, the construction of knowledge graphs, and the fusion of multi - source information, and finally generates a high - quality attack technology knowledge graph containing multiple sources. ### Main Contributions of MultiKG - **Multi - Source Threat Knowledge Collection and Aggregation**: It combines three information sources: threat intelligence, static code analysis, and dynamic log analysis, and can more comprehensively capture the details of attack behaviors. - **System Design and Implementation**: It proposes effective algorithms to extract attack technology graphs from large - scale audit logs, obtains attack node information by parsing static code through abstract syntax trees, and uses large language models (LLM) to analyze entities, entity types, and relationships in threat reports to obtain attack knowledge graphs. - **Experimental Verification**: MultiKG has been evaluated through actual attack technology and threat intelligence data sets. The results show that it can accurately extract attack knowledge graphs from different sources, and efficiently summarize and aggregate attack knowledge at the technical level, generating accurate and complete attack representations. Through these improvements, MultiKG can not only represent attack behaviors more precisely, but also improve the effectiveness of attack detection and reconstruction in practical applications.