Vulnerability Detection with Representation Learning.

Zhiqiang Wang,Sulong Meng,Ying Chen
DOI: https://doi.org/10.1007/978-981-99-0272-9_8
2022-01-01
Abstract:It is essential to identify potentially vulnerable code in our software systems. Deep neural network techniques have been used for vulnerability detection. However, existing methods usually ignore the feature representation of vulnerable datasets, resulting in unsatisfactory model performance. Such vulnerability detection techniques should achieve high accuracy, relatively high true-positive rate, and low false-negative rate. At the same time, it needs to be able to complete the vulnerability detection of actual projects and does not require additional expert knowledge or tedious configuration. In this article, we propose and implement VDDRL (A Vulnerability Detection Method Based On Deep Representation Learning). This deep representation learning-based vulnerability detection method combines feature extraction and ensemble learning. VDDRL uses the word2vec model to convert the source code into a vector representation. Deep representations of vulnerable code are learned from vulnerable code token sequences using LSTM models and then trained for classification using traditional machine learning algorithms. The training dataset we use is derived from actual projects and contains seven different types of vulnerabilities. Through comparative experiments on datasets, VDDRL achieves an Accuracy of 95.6%–98.7%, a Precision of 91.6%–99.0%, a Recall of 84.7%–99.5%, and an F1 of 88.1%–99.2%. Both perform better than the baseline method. Our experimental results show that VDDRL is a generic, lightweight, and extensible vulnerability detection method. Compared with other methods, it has better performance and robustness.
What problem does this paper attempt to address?