Meta-Path Based Attentional Graph Learning Model for Vulnerability Detection

Xin-Cheng Wen,Cuiyun Gao,Jiaxin Ye,Yichen Li,Zhihong Tian,Yan Jia,Xuan Wang
2023-12-11
Abstract:In recent years, deep learning (DL)-based methods have been widely used in code vulnerability detection. The DL-based methods typically extract structural information from source code, e.g., code structure graph, and adopt neural networks such as Graph Neural Networks (GNNs) to learn the graph representations. However, these methods fail to consider the heterogeneous relations in the code structure graph, i.e., the heterogeneous relations mean that the different types of edges connect different types of nodes in the graph, which may obstruct the graph representation learning. Besides, these methods are limited in capturing long-range dependencies due to the deep levels in the code structure graph. In this paper, we propose a Meta-path based Attentional Graph learning model for code vulNErability deTection, called MAGNET. MAGNET constructs a multi-granularity meta-path graph for each code snippet, in which the heterogeneous relations are denoted as meta-paths to represent the structural information. A meta-path based hierarchical attentional graph neural network is also proposed to capture the relations between distant nodes in the graph. We evaluate MAGNET on three public datasets and the results show that MAGNET outperforms the best baseline method in terms of F1 score by 6.32%, 21.50%, and 25.40%, respectively. MAGNET also achieves the best performance among all the baseline methods in detecting Top-25 most dangerous Common Weakness Enumerations (CWEs), further demonstrating its effectiveness in vulnerability detection.
Software Engineering
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve two main limitations of existing deep learning (DL) methods in code vulnerability detection: 1. **Ignoring heterogeneous relationships in the code structure graph**: - Existing deep - learning - based methods usually extract structural information from source code and use graph neural networks (GNNs) to learn graph representations. However, these methods fail to consider the heterogeneous relationships in the code structure graph, that is, different types of edges connect different types of nodes. This ignorance may lead to poor graph representation learning. - Heterogeneous relationships refer to the complex relationships between different types of nodes and edges, which can enrich the representation of nodes and thus contribute to more accurate vulnerability detection. 2. **Difficulty in capturing long - distance dependency relationships**: - Most deep - learning - based methods, including the state - of - the - art methods, use GNNs to capture the relationships between nodes when dealing with code structure graphs. However, GNNs have limitations in dealing with the relationships between distant nodes because they mainly rely on neighborhood aggregation for message passing. - Due to the large number of nodes and deep hierarchy in the code structure graph, existing methods still have difficulty in effectively learning long - distance dependency relationships when directly using GNNs for vulnerability detection. To solve these problems, the authors propose a meta - path - based attentional graph learning model - MAGNET (Meta - path based Attentional Graph learning model for code vulnerability detection). MAGNET improves existing methods in the following two ways: - **Multi - granularity meta - path graph construction**: In order to utilize the heterogeneous relationships in the code structure graph, MAGNET constructs a multi - granularity meta - path graph, in which heterogeneous relationships are represented as meta - paths to represent structural information. - **Meta - path - based hierarchical attentional graph neural network**: A meta - path - based hierarchical attentional graph neural network (MHAGNN) is proposed to capture the relationships between distant nodes in the graph. Through these improvements, the experimental results of MAGNET on three public datasets show that it improves the F1 score by 6.32%, 21.50% and 25.40% respectively compared with the best baseline method, and performs well in detecting the top - 25 most dangerous Common Weakness Enumerations (CWEs), further proving its effectiveness in vulnerability detection. ### Summary The main contributions of this paper are: 1. Proposing a new meta - path - based attentional graph learning model MAGNET for capturing heterogeneous relationships in the code structure graph. 2. Proposing a meta - path - based hierarchical attentional graph neural network (MHAGNN) that can learn the representation of each meta - path and capture long - distance dependency relationships. 3. Verifying the effectiveness of MAGNET in code vulnerability detection through extensive experiments and publicly releasing the code and experimental data to promote future research.