An Empirical Evaluation of Deep Learningā€based Source Code Vulnerability Detection: Representation Versus Models

Abubakar Omari Abdallah Semasaba,Wei Zheng,Xiaoxue Wu,Samuel Akwasi Agyemang,Tao Liu,Yuan Ge
DOI: https://doi.org/10.1002/smr.2422
2022-01-01
Journal of Software Evolution and Process
Abstract:Vulnerabilities in the source code of the software are critical issues in the realm of software engineering. Coping with vulnerabilities in software source code is becoming more challenging due to several aspects such as complexity and volume. Deep learning has gained popularity throughout the years as a means of addressing such issues. This paper proposes an evaluation of vulnerability detection performance on source code representations and evaluates how machine learning (ML) strategies can improve them. The structure of our experiment consists of three deep neural networks (DNNs) in conjunction with five different source code representations: abstract syntax trees (ASTs), code gadgets (CGs), semantics-based vulnerability candidates (SeVCs), lexed code representations (LCRs), and composite code representations (CCRs). Experimental results show that employing different ML strategies in conjunction with the base model structure influences the performance results to a varying degree. However, ML-based techniques suffer from poor performance on class imbalance handling and dimensionality reduction when used in conjunction with source code representations.
What problem does this paper attempt to address?