Learning Graph-based Patch Representations for Identifying and Assessing Silent Vulnerability Fixes

Mei Han,Lulu Wang,Jianming Chang,Bixin Li,Chunguang Zhang
2024-09-13
Abstract:Software projects are dependent on many third-party libraries, therefore high-risk vulnerabilities can propagate through the dependency chain to downstream projects. Owing to the subjective nature of patch management, software vendors commonly fix vulnerabilities silently. Silent vulnerability fixes cause downstream software to be unaware of urgent security issues in a timely manner, posing a security risk to the software. Presently, most of the existing works for vulnerability fix identification only consider the changed code as a sequential textual sequence, ignoring the structural information of the code. In this paper, we propose GRAPE, a GRAph-based Patch rEpresentation that aims to 1) provide a unified framework for getting vulnerability fix patches representation; and 2) enhance the understanding of the intent and potential impact of patches by extracting structural information of the code. GRAPE employs a novel joint graph structure (MCPG) to represent the syntactic and semantic information of fix patches and embeds both nodes and edges. Subsequently, a carefully designed graph convolutional neural network (NE-GCN) is utilized to fully learn structural features by leveraging the attributes of the nodes and edges. Moreover, we construct a dataset containing 2251 silent fixes. For the experimental section, we evaluated patch representation on three tasks, including vulnerability fix identification, vulnerability types classification, and vulnerability severity classification. Experimental results indicate that, in comparison to baseline methods, GRAPE can more effectively reduce false positives and omissions of vulnerability fixes identification and provide accurate vulnerability assessments.
Software Engineering
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to automatically identify and evaluate silent vulnerability fixes to help downstream software projects timely understand and respond to security issues in upstream software**. Specifically, existing methods mainly regard code changes as sequential text sequences when identifying vulnerability fixes, ignoring the structural information of the code. This leads to a high false - negative rate and makes it difficult to fully understand the impact of vulnerability fixes. In addition, since software vendors often fix vulnerabilities privately (i.e., silent fixes), downstream software developers cannot be informed of these important security updates in a timely manner, thus increasing the software's security risks. To solve these problems, the author proposes a graph - based patch representation framework **GRAPE**, aiming to: 1. Provide a unified framework to obtain the representation of vulnerability fix patches; 2. Enhance the understanding of patch intentions and their potential impacts by extracting the structural information of the code. ### Main contributions - **MCPG (Multi - Component Program Graph)**: A graph representation method that combines syntactic and semantic information for representing silent fix patches. - **NE - GCN (Node and Edge - aware Graph Convolutional Network)**: A specially designed graph convolutional neural network that can fully learn structural features through node and edge feature transfer and aggregation operations. - **Experimental evaluation**: The effectiveness of GRAPE was evaluated on three tasks through the constructed dataset, including vulnerability fix identification, vulnerability type classification, and vulnerability severity classification. ### Key technologies 1. **Merging CPGs**: Merge the CPGs of defective code and fixed code into MCPG, retaining important structural information and reducing noise interference. 2. **Graph Embedding**: Embed the nodes and edges in MCPG into feature vectors, where node embedding takes into account code fragments and node types, and edge embedding takes into account version and type information. 3. **GCN Learning**: Use NE - GCN for message passing and feature aggregation to capture the relationships between nodes, and further integrate the feature information of multi - hop neighbors through multi - layer iterative propagation. ### Experimental results Experiments show that, compared with six existing state - of - the - art deep - learning methods, GRAPE performs well on all indicators. In particular, in the silent vulnerability fix identification task, the accuracy and F1 - score of GRAPE are increased by 7.1% and 6.55% respectively; in the multi - class vulnerability type classification task, the MCC and F1 - score are increased by 10.26% and 12.57% respectively; in the multi - class vulnerability severity classification task, the MCC and F1 - score are increased by 14.19% and 14.98% respectively. In conclusion, this research significantly improves the accuracy of vulnerability fix identification and evaluation by proposing the graph - based patch representation framework GRAPE, which helps to improve the efficiency of software security management.