Just-in-Time Defect Prediction Technology Based on Interpretability Technology

Wei Zheng,Tianren Shen,Xiang Chen
DOI: https://doi.org/10.1109/dsa52907.2021.00017
2021-01-01
Abstract:In recent years, in the field of software defect prediction, researchers have proposed the Just-in-Time defect prediction technology, which can predict whether there are defects in each code change submitted by developers. This method is instant and easy to trace. However, the accuracy of Just-in-Time defect prediction is affected by the imbalance of data set categories. 20% of the defects in the software engineering field may exist in 80% of the modules. In most cases, code changes that do not cause defects account for a larger proportion. Therefore, there is an imbalance rate in the data set, that is, the imbalance between the minority and majority categories, which will affect the classification prediction effect of the model. Most types, that is, code changes that will not produce defects will make the model have an artificially high prediction accuracy, and it is difficult to obtain the expected results in practical applications. Moreover, the data set features contain many irrelevant features and redundant features, which will also increase the complexity of the prediction model. In order to improve the prediction efficiency of just in time defect prediction. Improve the interpretability and transparency of the model and establish the trust relationship between users and decision-making model. For this reason, we have established a RandomForest defect prediction model, using multiple different types of change features to study 6 open source projects from different fields. The model is explained to a certain extent using LIME interpretability technology . Using interpretability methods to extract features and trying to reduce the developer’s workload as much as possible. Our research results show that through the interpretability of the defect prediction model and identifying key features, 45% of the original workload can be used, and 96% of the original work effect can be achieved.
What problem does this paper attempt to address?