Software Defect Prediction Based on Deep Representation Learning of Source Code From Contextual Syntax and Semantic Graph

Ahmed Abdu,Zhengjun Zhai,Hakim A. Abdo,Redhwan Algabri
DOI: https://doi.org/10.1109/tr.2024.3354965
IF: 5.883
2024-01-01
IEEE Transactions on Reliability
Abstract:Software defect prediction approaches play an essential role in the software development life cycle to help developers predict defects early, thus, preventing wasted time and effort. Defect prediction techniques based on semantic features have recently gained success over approaches based on traditional features. Existing semantic features-based defect prediction approaches use a single source code representation. Most studies focus on contextual syntax represented by abstract syntax trees, and some studies use a control flow graph to represent code graphs. However, a single representation is still limited for predicting defects that call multiple functions and have a high probability of false positives. To close the gap between source code representations on software defect prediction, we propose a defect prediction model based on multiple source code representations. The proposed model is a deep hierarchical convolutional neural network (DH-CNN). The syntax features extracted from abstract syntax trees using Word2vec are fed into syntax-level DH-CNN, and the semantic-graph features extracted from the control flow graph and data dependence graph using Node2vec are fed into semantic-level DH-CNN. In addition, the proposed model includes a gated merging mechanism that combines DH-CNN outputs to estimate the combination ratio of both types of features. Experimental results indicate that DH-CNN outperforms existing methods under cross-project and within-project scenarios.
engineering, electrical & electronic,computer science, software engineering, hardware & architecture
What problem does this paper attempt to address?