Can Deep Learning Models Learn the Vulnerable Patterns for Vulnerability Detection?

Guoqing Yan,Sen Chen,Yude Bail,Xiaohong Li
DOI: https://doi.org/10.1109/compsac54236.2022.00142
2022-01-01
Abstract:Deep learning has been widely used for the security issue of vulnerability prediction. However, it is confusing to explain how a deep learning model makes decisions on the prediction, although such a model achieves a good performance. Meanwhile, it is also difficult to discover which part of the source code is concentrated on by this black-box model. To this end, we present an empirical evaluation to explore how the deep learning model works on predicting vulnerability and whether it precisely captures the critical code segments to represent the vulnerable patterns. First of all, we build a new vulnerability dataset, called Juliet+, in which vulnerability-related code lines of both positive (bad) and negative (good) samples are labeled manually with substantial efforts, based on the Juliet Test Suite. After that, four deep learning models by leveraging attention mechanisms are empirically implemented to detect vulnerability through mining vulnerable patterns from the source code. We conduct extensive experiments to evaluate the effectiveness of such four models and to analyze the interpretability with evaluation metrics such as Hit@k. The empirical experiment results reveal that the deep learning models with attention, to some extent, can focus on the vulnerability-related code segments that are profitable to interpret the result of vulnerability detection, especially when we adopt the graph neural network model. We further investigate what factors affect the interpretability of models including the class distribution, the number of samples, and the differences of sample features. We find the graph neural network model performs better on part of the dataset which contains balanced and sufficient samples with obvious differences between vulnerable and non-vulnerable patterns.
What problem does this paper attempt to address?