Abstract:Vulnerabilities in software security can remain undiscovered even after being exploited. Linking attacks to vulnerabilities helps experts identify and respond promptly to the incident. This paper introduces VULDAT, a classification tool using a sentence transformer MPNET to identify system vulnerabilities from attack descriptions. Our model was applied to 100 attack techniques from the ATT&CK repository and 685 issues from the CVE repository. Then, we compare the performance of VULDAT against the other eight state-of-the-art classifiers based on sentence transformers. Our findings indicate that our model achieves the best performance with F1 score of 0.85, Precision of 0.86, and Recall of 0.83. Furthermore, we found 56% of CVE reports vulnerabilities associated with an attack were identified by VULDAT, and 61% of identified vulnerabilities were in the CVE repository.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to automatically correlate attack techniques (AT) with known software vulnerabilities (Common Vulnerabilities and Exposures, CVE). Specifically, the authors developed a tool named VULDAT, which aims to automatically identify and link to relevant CVE reports by analyzing attack description texts. The importance of this problem lies in: 1. **Improving response speed**: By automatically correlating attack techniques and CVE reports, security experts can identify and respond to security incidents more quickly. 2. **Enhancing efficiency**: Manually correlating a large number of attack techniques and CVE reports is a cumbersome and error - prone task, and an automated tool can significantly improve work efficiency. 3. **Strengthening security**: Accurate correlation can help organizations better understand potential security threats and take effective defensive measures. ### Research Background As the frequency and complexity of cyber - attacks continue to increase, cyber - security threat intelligence (Cyber Threat Intelligence, CTI) becomes increasingly important. The MITRE Corporation has created multiple resource libraries, such as ATT&CK, CAPEC, CWE, and CVE, to record and classify various attack patterns and vulnerabilities. However, the information between these resource libraries is usually isolated and requires manual correlation, which is not only time - consuming but also error - prone. ### Main Research Questions The paper mainly answers two research questions (RQs): - **RQ1**: How does the sentence transformation model perform in detecting software vulnerabilities from attack texts? - **RQ2**: How many CVE issues can VULDAT correctly detect? ### Method Overview To achieve the above goals, the authors designed a tool VULDAT based on a sentence transformer. The specific steps include: 1. **Data collection**: Collect attack descriptions and vulnerability reports from MITRE's ATT&CK, CAPEC, CWE, and CVE libraries. 2. **Pre - processing**: Clean and standardize the attack description texts, including removing URLs, references, stop - words, etc. 3. **Design VULDAT architecture**: Use a pre - trained sentence transformer model (such as MPNet) to generate embedding vectors for attack descriptions and CVE reports. 4. **Calculate similarity scores**: Determine the most relevant CVE reports by calculating the cosine similarity between the embedding vectors. 5. **Performance evaluation**: Evaluate the performance of VULDAT using metrics such as precision, recall, and F1 - score. ### Main Contributions - Developed the VULDAT tool, providing an automated method for detecting software vulnerabilities. - Created a new annotated mapping data set, explicitly connecting ATT&CK with vulnerabilities in the MITRE library. - Conducted a comparative analysis of multiple sentence transformer models, showing their performance differences under different pre - processing conditions. ### Results According to the experimental results, VULDAT performs well under both partial pre - processing and full pre - processing conditions. Especially when using the multi - qa - mpnet - base - dot - v1 model, the F1 - score reaches 0.85, the precision is 0.86, and the recall is 0.83. In addition, VULDAT can correctly detect about 61% of CVE issues, and the average Jaccard similarity is 0.40. ### Future Work The authors plan to further explore other models (such as sequence - to - sequence models) to improve CVE detection, and conduct large - scale manual inspections to verify the results output by VULDAT and recommend missing links between attacks and vulnerabilities to the MITRE committee.

Cybersecurity Defenses: Exploration of CVE Types through Attack Descriptions

Function-Level Vulnerability Detection Through Fusing Multi-Modal Knowledge

Fine-grained Commit-level Vulnerability Type Prediction by CWE Tree Structure.

Linking Common Vulnerabilities and Exposures to the MITRE ATT&CK Framework: A Self-Distillation Approach

Categorizing and Predicting Invalid Vulnerabilities on Common Vulnerabilities and Exposures

V2W-BERT: A Framework for Effective Hierarchical Multiclass Classification of Software Vulnerabilities

CVE2ATT&CK: BERT-Based Mapping of CVEs to MITRE ATT&CK Techniques

CVE-driven Attack Technique Prediction with Semantic Information Extraction and a Domain-specific Language Model

Automated Characterization of Software Vulnerabilities

Detecting software vulnerabilities using Language Models

VDDL: A Deep Learning-Based Vulnerability Detection Model for Smart Contracts.

Unveiling Hidden Links Between Unseen Security Entities

Linking Threat Tactics, Techniques, and Patterns with Defensive Weaknesses, Vulnerabilities and Affected Platform Configurations for Cyber Hunting

A Categorization Framework for Common Computer Vulnerabilities and Exposures

Codesentry: Revolutionizing Real-Time Software Vulnerability Detection With Optimized GPT Framework

A Comparison of Vulnerability Feature Extraction Methods from Textual Attack Patterns

Automated Mapping of CVE Vulnerability Records to MITRE CWE Weaknesses

Automated CVE Analysis for Threat Prioritization and Impact Prediction

CVSS-BERT: Explainable Natural Language Processing to Determine the Severity of a Computer Security Vulnerability from its Description

Cyber-Attack Technique Classification Using Two-Stage Trained Large Language Models

Transformer-based Vulnerability Detection in Code at EditTime: Zero-shot, Few-shot, or Fine-tuning?