Abstract:Successful cyber-attacks are caused by the exploitation of some vulnerabilities in the software and/or hardware that exist in systems deployed in premises or the cloud. Although hundreds of vulnerabilities are discovered every year, only a small fraction of them actually become exploited, thereby there exists a severe class imbalance between the number of exploited and non-exploited vulnerabilities. The open source national vulnerability database, the largest repository to index and maintain all known vulnerabilities, assigns a unique identifier to each vulnerability. Each registered vulnerability also gets a severity score based on the impact it might inflict upon if compromised. Recent research works showed that the cvss score is not the only factor to select a vulnerability for exploitation, and other attributes in the national vulnerability database can be effectively utilized as predictive feature to predict the most exploitable vulnerabilities. Since cybersecurity management is highly resource savvy, organizations such as cloud systems will benefit when the most likely exploitable vulnerabilities that exist in their system software or hardware can be predicted with as much accuracy and reliability as possible, to best utilize the available resources to fix those first. Various existing research works have developed vulnerability exploitation prediction models by addressing the existing class imbalance based on algorithmic and artificial data resampling techniques but still suffer greatly from the overfitting problem to the major class rendering them practically unreliable. In this research, we have designed a novel cost function feature to address the existing class imbalance. We also have utilized the available large text corpus in the extracted dataset to develop a custom-trained word vector that can better capture the context of the local text data for utilization as an embedded layer in neural networks. Our developed vulnerability exploitation prediction models powered by a novel cost function and custom-trained word vector have achieved very high overall performance metrics for accuracy, precision, recall, F1-Score and AUC score with values of 0.92, 0.89, 0.98, 0.94 and 0.97, respectively, thereby outperforming any existing models while successfully overcoming the existing overfitting problem for class imbalance.

Vulnerability Time Series Prediction Based on Multivariable LSTM

Towards More Practical Automation of Vulnerability Assessment

Vulnerability Severity Prediction Model for Software Based on Markov Chain.

Vulnerability Detection for Source Code Using Contextual LSTM

Vulnerability Forecasting: In theory and practice

Learning-based Models for Vulnerability Detection: An Extensive Study

Early and Realistic Exploitability Prediction of Just-Disclosed Software Vulnerabilities: How Reliable Can It Be?

An Improved Vulnerability Exploitation Prediction Model with Novel Cost Function and Custom Trained Word Vector Embedding

Combining Software Metrics and Text Features for Vulnerable File Prediction

Are Latent Vulnerabilities Hidden Gems for Software Vulnerability Prediction? An Empirical Study

An approach for predicting multiple-type overflow vulnerabilities based on combination features and a time series neural network algorithm

Forecasting severity of software vulnerability using grey model GM(1,1)

Explaining the Contributing Factors for Vulnerability Detection in Machine Learning

Joint prediction on security event and time interval through deep learning

Predicting Missing Information of Key Aspects in Vulnerability Reports

A Novel Vulnerability Prediction Model to Predict Vulnerability Loss Based on Probit Regression

An empirical study of text-based machine learning models for vulnerability detection

Predicting Severity of Software Vulnerability Based on Grey System Theory.

BACKTIME: Backdoor Attacks on Multivariate Time Series Forecasting

FastEmbed: Predicting vulnerability exploitation possibility based on ensemble machine learning algorithm

Predicting Exploitation of Disclosed Software Vulnerabilities Using Open-source Data