Uncovering CWE-CVE-CPE Relations with Threat Knowledge Graphs

Zhenpeng Shi,Nikolay Matyunin,Kalman Graffi,David Starobinski
DOI: https://doi.org/10.1145/3641819
IF: 2.717
2024-02-05
ACM Transactions on Privacy and Security
Abstract:Security assessment relies on public information about products, vulnerabilities, and weaknesses. So far, databases in these categories have rarely been analyzed in combination. Yet, doing so could help predict unreported vulnerabilities and identify common threat patterns. In this article, we propose a methodology for producing and optimizing a knowledge graph that aggregates knowledge from common threat databases (CVE, CWE, and CPE). We apply the threat knowledge graph to predict associations between threat databases, specifically between products, vulnerabilities, and weaknesses. We evaluate the prediction performance both in closed world with associations from the knowledge graph and in open world with associations revealed afterward. Using rank-based metrics (i.e., Mean Rank, Mean Reciprocal Rank, and Hits@N scores), we demonstrate the ability of the threat knowledge graph to uncover many associations that are currently unknown but will be revealed in the future, which remains useful over different time periods. We propose approaches to optimize the knowledge graph and show that they indeed help in further uncovering associations. We have made the artifacts of our work publicly available.
computer science, information systems
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: There is insufficient correlation analysis among existing threat databases (such as CVE, CWE, and CPE), making it difficult to predict unreported vulnerabilities and identify common threat patterns. Specifically, the author points out that although these databases provide public information about products, vulnerabilities, and weaknesses, this information is usually isolated and rarely comprehensively analyzed. Therefore, the author proposes a method based on the threat knowledge graph, aiming to integrate and optimize knowledge from different threat databases to predict hidden correlations, especially the correlations among products, vulnerabilities, and weaknesses. ### Specific description of the problem 1. **Limitations of existing threat databases**: - Existing threat databases (such as CVE, CWE, and CPE) provide a large amount of public information, but this information is often isolated and lacks effective comprehensive analysis. - The correlations between these databases are usually analyzed manually, which is time - consuming and prone to missing important correlation information. 2. **The need to predict unreported vulnerabilities and identify common threat patterns**: - Predicting unreported vulnerabilities can help security assessors discover potential security risks in advance, so as to take preventive measures. - Identifying common threat patterns helps to better understand the nature of security threats and then formulate more effective protection strategies. ### Solutions proposed in the paper To address the above problems, the author proposes the following solutions: 1. **Constructing a threat knowledge graph**: - Convert the entries in the CVE, CWE, and CPE databases and their correlations into triples (entity - relationship - entity) in the knowledge graph to form a unified knowledge representation. - Map these triples into vector space through knowledge graph embedding techniques for link prediction. 2. **Optimizing the knowledge graph**: - Reduce redundant information by merging CPE entries with the same properties (except for the version number). - Remove CPE and CVE entries without correlation information to improve the quality of the knowledge graph. 3. **Predicting hidden correlation relationships**: - Use machine learning methods (such as embedding models such as TransE, DistMult, and ComplEx) to train the knowledge graph and predict currently unknown but potentially revealed correlation relationships in the future. - Verify the prediction performance of the model through closed - world and open - world evaluation experiments. ### Main contributions 1. **Proposing and implementing the concept of the threat knowledge graph**: - Integrate the entries in different threat databases and their correlations into a unified knowledge graph. - Optimize the knowledge graph to improve its prediction ability. 2. **Applying knowledge graph embedding techniques for link prediction**: - Compare multiple embedding models (such as TransE, DistMult, and ComplEx) and show the superior performance of the TransE model in the task. 3. **Evaluating the prediction ability of the threat knowledge graph in different scenarios**: - Conduct extensive evaluations in closed - world and open - world settings to verify the effectiveness of the model. - Show the prediction ability of the model in different time periods. 4. **Exploring methods to further optimize the knowledge graph**: - Further improve the prediction ability by removing obsolete entries and introducing data from other databases (such as CAPEC entries and CVSS vectors). In conclusion, this paper provides an effective method to predict unreported vulnerabilities and identify common threat patterns by constructing and optimizing the threat knowledge graph, thereby enhancing the effectiveness of security assessment.