Exploring better alternatives to size metrics for explainable software defect prediction

Chenchen Chai,Guisheng Fan,Huiqun Yu,Zijie Huang,Jianshu Ding,Yao Guan
DOI: https://doi.org/10.1007/s11219-023-09656-y
2023-12-30
Software Quality Journal
Abstract:Delivering reliable software under the constraint of limited time and budget is a significant challenge. Recent progress in software defect prediction is helping developers to locate defect-prone code components and allocate quality assurance resources more efficiently. However, practitioners' criticisms on defect predictors from academia are not practical since they rely heavily on size metrics such as lines of code (LOC), which over-abstracts technical details and provides limited insights for software maintenance. Thus, the performance of predictors may be overclaimed. In response, based on a state-of-the-art defect prediction model, we (1) exclude size metrics and evaluate the impact on performance, (2) include new features such as network dependency metrics, and (3) explore which ones are better alternatives to size metrics using explainable artificial intelligence (XAI) technique. We find that excluding size metrics decreases model performance by 1.99% and 0.66% on AUC-ROC in within- and cross-project prediction respectively. The results show that two involved network dependence metrics (i.e., Betweenness and pWeakC(out)) and four other code metrics (i.e., LCOM, AVG(CC), LCOM3, and CAM) could effectively preserve or improve the prediction performance, even if we exclude size metrics. In conclusion, we suggest discarding size metrics and involving the mentioned network dependency metrics for better performance and explainability.
computer science, software engineering
What problem does this paper attempt to address?