Feature Importance in the Context of Traditional and Just-In-Time Software Defect Prediction Models

Susmita Haldar,Luiz Fernando Capretz
DOI: https://doi.org/10.1109/CCECE59415.2024.10667167
2024-11-08
Abstract:Software defect prediction models can assist software testing initiatives by prioritizing testing error-prone modules. In recent years, in addition to the traditional defect prediction model approach of predicting defects from class, modules, etc., Just-In-Time defect prediction research, which focuses on the change history of software products is getting prominent. For building these defect prediction models, it is important to understand which features are primary contributors to these classifiers. This study considered developing defect prediction models incorporating the traditional and the Just-In-Time approaches from the publicly available dataset of the Apache Camel project. A multi-layer deep learning algorithm was applied to these datasets in comparison with machine learning algorithms. The deep learning algorithm achieved accuracies of 80% and 86%, with the area under receiving operator curve (AUC) scores of 66% and 78% for traditional and Just-In-Time defect prediction, respectively. Finally, the feature importance of these models was identified using a model-specific integrated gradient method and a model-agnostic Shapley Additive Explanation (SHAP) technique.
Software Engineering
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is how to identify which features are the most important for prediction results in traditional software defect prediction models and Just - In - Time (JIT) software defect prediction models. Specifically, the two core issues of the research are: 1. **Performance comparison between traditional defect prediction models and Just - In - Time defect prediction models**: - The research compared the prediction accuracy and reliability of these two models when using deep - learning algorithms. - Through experimental verification, for larger data sets, deep - learning algorithms perform better in terms of accuracy, but are slightly inferior to the random forest model in terms of the AUC (Area Under the Receiver Operating Characteristic Curve) metric. 2. **Feature importance analysis**: - The research aims to determine which features are the most important when constructing these defect prediction models and explore whether there are common important features between these two models. - Two methods were used to evaluate the importance of features: the model - specific Integrated Gradient method and the model - agnostic SHAP (Shapley Additive Explanation) technique. ### Main findings - **Traditional defect prediction models**: - For traditional defect prediction models, features such as `lcom3` (Lack of Cohesion in Methods), `avg cc` (Average Cyclomatic Complexity), `cbm` (Coupling Between Methods), etc. are identified as important features. - The SHAP method shows that the total number of lines of code (LOC) and `lcom` (Lack of Cohesion in Methods) are the most important features. - **Just - In - Time defect prediction models**: - For JIT models, features such as whether the defect has been fixed (`Fix`), the distribution of modified code (`Entropy`), the number of lines of added code (`LA`), etc. are considered important. - The SHAP method indicates that the number of lines of added code (`LA`) and the developers' experience (`EXP`, `REXP`, `SEXP`) have a significant impact on the model performance. ### Summary By comparing the performance of traditional and JIT defect prediction models and analyzing their feature importance, this research provides valuable insights for test managers to help them conduct more effective test planning at different stages of the software life cycle. In addition, the research also shows the advantages of deep - learning algorithms in handling larger data sets and emphasizes the importance of different feature selection methods.