Automated Software Vulnerability Detection Via Pre-trained Context Encoder and Self Attention

Na Li,Haoyu Zhang,Zhihui Hu,Guang Kou,Huadong Dai
DOI: https://doi.org/10.1007/978-3-031-06365-7_15
2022-01-01
Abstract:With the increasing size and complexity of modern software projects, it is almost impossible to discover all software vulnerabilities in time by manual analysis. Most existing vulnerability detection methods rely on manual designed vulnerability features, which is costly and leads to high false positive rates. Pre-trained models for programming language have been used to gain dramatic improvements to code-related tasks, which considers syntactic-level structure of code further. Thus, we propose an automated vulnerability detection method based on pre-trained context encoder as well as self-attention mechanism. Instead of current static analysis approaches, we treat the program source code as natural language and introduce the pre-trained contextualized language model to capture the program local dependencies and learn a better contextualized representation. The extracted source code feature vectors are then fed into a designed Self Attention Networks (SAN) module. We develop the SAN module based on Long-Short Term Memory (LSTM) model and self attention, which learns the long-range dependencies of program vulnerable points more efficiently. We conduct experiments on two source code level C program benchmark datasets, where four different evaluation metrics are applied for comparing the vulnerability detection performances of different systems. Extensive experimental results demonstrate that our proposed model outperforms previous state-of-the-art automated vulnerability detection method by around 7.2% in F1-measure and 2.6% in precision.
What problem does this paper attempt to address?