Abstract:Vulnerability detection is essential to protect software systems. Various approaches based on deep learning have been proposed to learn the pattern of vulnerabilities and identify them. Although these approaches have shown vast potential in this task, they still suffer from the following issues: (1) It is difficult for them to distinguish vulnerability-related information from a large amount of irrelevant information, which hinders their effectiveness in capturing vulnerability features. (2) They are less effective in handling long code because many neural models would limit the input length, which hinders their ability to represent the long vulnerable code snippets. To mitigate these two issues, in this work, we proposed to decompose the syntax-based Control Flow Graph (CFG) of the code snippet into multiple execution paths to detect the vulnerability. Specifically, given a code snippet, we first build its CFG based on its Abstract Syntax Tree (AST), refer to such CFG as syntax-based CFG, and decompose the CFG into multiple paths from an entry node to its exit node. Next, we adopt a pre-trained code model and a convolutional neural network to learn the path representations with intra- and inter-path attention. The feature vectors of the paths are combined as the representation of the code snippet and fed into the classifier to detect the vulnerability. Decomposing the code snippet into multiple paths can filter out some redundant information unrelated to the vulnerability and help the model focus on the vulnerability features. Besides, since the decomposed paths are usually shorter than the code snippet, the information located in the tail of the long code is more likely to be processed and learned. To evaluate the effectiveness of our model, we build a dataset with over 231k code snippets, in which there are 24k vulnerabilities. Experimental results demonstrate that the proposed approach outperforms state-of-the-art baselines by at least 22.30%, 42.92%, and 32.58% in terms of Precision, Recall, and F1-Score, respectively. Our further analysis investigates the reason for the proposed approach's superiority.

Automated Software Vulnerability Detection Via Pre-trained Context Encoder and Self Attention

Vulnerability Detection for Source Code Using Contextual LSTM

Towards More Practical Automation of Vulnerability Assessment

Ignnvd: A Novel Software Vulnerability Detection Model Based on Integrated Graph Neural Networks

Multi-context Attention Fusion Neural Network for Software Vulnerability Identification

Hybrid semantics-based vulnerability detection incorporating a Temporal Convolutional Network and Self-attention Mechanism

Automated software vulnerability detection with machine learning

TACSan: Enhancing Vulnerability Detection with Graph Neural Network

Context and Multi-Features-Based Vulnerability Detection: A Vulnerability Detection Frame Based on Context Slicing and Multi-Features

Survey of Source Code Vulnerability Analysis Based on Deep Learning

Vulnerability Detection by Learning from Syntax-Based Execution Paths of Code

Automated Vulnerability Detection in Source Code Using Minimum Intermediate Representation Learning

Vulnerability Detection in C/C++ Code with Deep Learning

VUDENC: Vulnerability Detection with Deep Learning on a Natural Codebase for Python

SAFE: Advancing Large Language Models in Leveraging Semantic and Syntactic Relationships for Software Vulnerability Detection

A Comparative Study of Deep Learning-Based Vulnerability Detection System

Path-sensitive Code Embedding Via Contrastive Learning for Software Vulnerability Detection

Transformer-based Vulnerability Detection in Code at EditTime: Zero-shot, Few-shot, or Fine-tuning?

VDDA: An Effective Software Vulnerability Detection Model Based on Deep Learning and Attention Mechanism

Combining Graph-Based Learning With Automated Data Collection for Code Vulnerability Detection