Abstract:The detection of software vulnerability requires critical attention during the development phase to make it secure and less vulnerable. Vulnerable software always invites hackers to perform malicious activities and disrupt the operation of the software, which leads to millions in financial losses to software companies. In order to reduce the losses, there are many reliable and effective vulnerability detection systems introduced by security communities aiming to detect the software vulnerabilities as early as in the development or testing phases. To summarise the software vulnerability detection system, existing surveys discussed the conventional and data mining approaches. These approaches are widely used and mostly consist of traditional detection techniques. However, they lack discussion on the newly trending machine learning approaches, such as supervised learning and deep learning techniques. Furthermore, existing studies fail to discuss the growing research interest in the software vulnerability detection community throughout the years. With more discussion on this, we can predict and focus on what are the research problems in software vulnerability detection that need to be urgently addressed. Aiming to reduce these gaps, this paper presents the research interests' taxonomy in software vulnerability detection, such as methods, detection, features, code and dataset. The research interest categories exhibit current trends in software vulnerability detection. The analysis shows that there is considerable interest in addressing methods and detection problems, while only a few are interested in code and dataset problems. This indicates that there is still much work to be done in terms of code and dataset problems in the future. Furthermore, this paper extends the machine learning approaches taxonomy, which is used to detect the software vulnerabilities, like supervised learning, semi-supervised learning, ensemble learning and deep learning. Based on the analysis, supervised learning and deep learning approaches are trending in the software vulnerability detection community as these techniques are able to detect vulnerabilities such as buffer overflow, SQL injection and cross-site scripting effectively with a significant detection performance, up to 95% of F1 score. Finally, this paper concludes with several discussions on potential future work in software vulnerability detection in terms of datasets, multi-vulnerabilities detection, transfer learning and real-world applications.

Using software metrics for predicting vulnerable classes in java and python based systems

Combining Software Metrics and Text Features for Vulnerable File Prediction

Predicting Vulnerable Components via Text Mining or Software Metrics? An Effort-Aware Perspective

Examining the Relationship of Code and Architectural Smells with Software Vulnerabilities

A metric for software vulnerabilities classification

Software security with natural language processing and vulnerability scoring using machine learning approach

LEOPARD: Identifying Vulnerable Code for Vulnerability Assessment through Program Metrics

Predicting Exploitation of Disclosed Software Vulnerabilities Using Open-source Data

The rise of software vulnerability: Taxonomy of software vulnerabilities detection and machine learning approaches

On the Use of Fine-grained Vulnerable Code Statements for Software Vulnerability Assessment Models

Automated software vulnerability detection with machine learning

Mitigating Access Control Vulnerabilities through Interactive Static Analysis.

An Empirical Study on Bug Severity Estimation using Source Code Metrics and Static Analysis

Predicting Vulnerability In Large Codebases With Deep Code Representation

Vulnerability Severity Prediction Model for Software Based on Markov Chain.

A Historical and Statistical Studyof the Software Vulnerability Landscape

Software Defect Prediction Framework Using Hybrid Software Metric

An empirical study of text-based machine learning models for vulnerability detection

Machine Learning Techniques for Python Source Code Vulnerability Detection

Automated Code-centric Software Vulnerability Assessment: How Far Are We? An Empirical Study in C/C++

A multi-target approach to estimate software vulnerability characteristics and severity scores