Line-Level Defect Prediction by Capturing Code Contexts with Graph Convolutional Networks

Shouyu Yin,Shikai Guo,Hui Li,Chenchen Li,Rong Chen,Xiaochen Li,He Jiang
DOI: https://doi.org/10.1109/tse.2024.3503723
IF: 7.4
2024-01-01
IEEE Transactions on Software Engineering
Abstract:Software defect prediction refers to the systematic analysis and review of software using various approaches and tools to identify potential defects or errors. Software defect prediction aids developers in swiftly identifying defects and optimizing development resource allocation, thus enhancing software quality and reliability. Previous defect prediction approaches still face two main limitations: 1) lacking of contextual semantic information and 2) Ignoring the joint reasoning between different granularities of defect predictions. In response to these challenges, we propose LineDef, a line-level defect prediction approach by capturing code contexts with graph convolutional networks. Specifically, LineDef comprises three components: the token embedding component, the graph extraction component, and the multi-granularity defect prediction component. The token embedding component maps each token to a vector to obtain a high-dimensional semantic feature representation of the token. Subsequently, the graph extraction component utilizes a sliding window to extract line-level and token-level graphs, addressing the challenge of capturing contextual semantic relationships in the code. Finally, the multi-granularity defect prediction component leverages graph convolutional layers and attention mechanisms to acquire prediction labels and risk scores, thereby achieving file-level and line-level defect prediction. Experimental studies on 32 datasets across 9 different software projects show that LineDef exhibits significantly enhanced balanced accuracy, ranging from 15.61% to 45.20%, compared to state-of-the-art file-level defect prediction approaches, and a remarkable cost-effectiveness improvement ranging from 15.32% to 278%, compared to state-of-the-art line-level defect prediction approaches. These results demonstrate that LineDef approach can extract more comprehensive information from lines of code for defect prediction.
What problem does this paper attempt to address?