Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation

Yufan Zhuang,Sahil Suneja,Veronika Thost,Giacomo Domeniconi,Alessandro Morari,Jim Laredo
DOI: https://doi.org/10.48550/arXiv.2109.03341
2021-09-08
Abstract:Identifying vulnerable code is a precautionary measure to counter software security breaches. Tedious expert effort has been spent to build static analyzers, yet insecure patterns are barely fully enumerated. This work explores a deep learning approach to automatically learn the insecure patterns from code corpora. Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program, in order to improve prediction performance. Compared with a generic GNN, our enhancements include a synthesis of multiple representations learned from the several parsed graphs of a program, and a new training loss metric that leverages the fine granularity of labeling. Our model outperforms multiple text, image and graph-based approaches, across two real-world datasets.
Artificial Intelligence,Software Engineering
What problem does this paper attempt to address?