JIT-Smart: A Multi-task Learning Framework for Just-in-Time Defect Prediction and Localization

Xiangping Chen,Furen Xu,Yuan Huang,Neng Zhang,Zibin Zheng
DOI: https://doi.org/10.1145/3643727
2024-01-01
Abstract:Just-in-time defect prediction (JIT-DP) is used to predict the defect-proneness of a commit and just-in-time defect localization (JIT-DL) is used to locate the exact buggy positions (defective lines) in a commit. Recently, various JIT-DP and JIT-DL techniques have been proposed, while most of them use a post-mortem way (e.g., code entropy, attention weight, LIME) to achieve the JIT-DL goal based on the prediction results in JIT-DP. These methods do not utilize the label information of the defective code lines during model building. In this paper, we propose a unified model JIT-Smart, which makes the training process of just-in-time defect prediction and localization tasks a mutually reinforcing multi-task learning process. Specifically, we design a novel defect localization network (DLN), which explicitly introduces the label information of defective code lines for supervised learning in JIT-DL with considering the class imbalance issue. To further investigate the accuracy and cost-effectiveness of JIT-Smart, we compare JIT-Smart with 7 state-of-the-art baselines under 5 commit-level and 5 line-level evaluation metrics in JIT-DP and JIT-DL. The results demonstrate that JIT-Smart is statistically better than all the state-of-the-art baselines in JIT-DP and JIT-DL. In JIT-DP, at the median value, JIT-Smart achieves F1-Score of 0.475, AUC of 0.886, Recall@20%Effort of 0.823, Effort@20%Recall of 0.01 and Popt of 0.942 and improves the baselines by 19.89%-702.74%, 1.23%-31.34%, 9.44%-33.16%, 21.6%-53.82% and 1.94%-34.89%, respectively . In JIT-DL, at the median value, JIT-Smart achieves Top-5 Accuracy of 0.539 and Top-10 Accuracy of 0.396, Recall@20%Effort line of 0.726, Effort@20%Recall line of 0.087 and IFA line of 0.098 and improves the baselines by 101.83%-178.35%, 101.01%-277.31%, 257.88%-404.63%, 71.91%-74.31% and 99.11%-99.41%, respectively. Statistical analysis shows that our JIT-Smart performs more stably than the best-performing model. Besides, JIT-Smart also achieves the best performance compared with the state-of-the-art baselines in cross-project evaluation.
What problem does this paper attempt to address?