Building Defect Prediction Models by Online Learning Considering Defect Overlooking

Nikolay Fedorov,Yuta Yamasaki,Masateru Tsunoda,Akito Monden,Amjed Tahir,Kwabena Ebo Bennin,Koji Toda,Keitaro Nakasai

2024-04-17

Abstract:Building defect prediction models based on online learning can enhance prediction accuracy. It continuously rebuilds a new prediction model, when a new data point is added. However, a module predicted as "non-defective" can result in fewer test cases for such modules. Thus, a defective module can be overlooked during testing. The erroneous test results are used as learning data by online learning, which could negatively affect prediction accuracy. To suppress the negative influence, we propose to apply a method that fixes the prediction as positive during the initial stage of online learning. Additionally, we improved the method to consider the probability of the overlooking. In our experiment, we demonstrate this negative influence on prediction accuracy, and the effectiveness of our approach. The results show that our approach did not negatively affect AUC but significantly improved recall.

Software Engineering

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: when using the online learning method to construct a software defect prediction model, the problem of decreased prediction accuracy due to defect overlooking. Specifically: 1. **The impact of defect overlooking**: - When a module is predicted as "non - defective", developers usually write fewer test cases for such modules in order to allocate test resources efficiently. This may lead to the actual defective modules being overlooked in the test, thus affecting the accuracy of the learning data of the prediction model. - Such wrong test results, as data for online learning, may have a negative impact on the accuracy of the prediction model. 2. **Two types of defect overlooking**: - **Type 1 overlooking**: When the prediction result is "non - defective", the actually existing defects may be overlooked due to fewer test cases. - **Type 2 overlooking**: Even if the prediction result is "defective" and a large number of tests are carried out, there may still be undetected defects. 3. **Solutions**: - The paper proposes a fixed prediction method, that is, in the initial stage of online learning, a certain proportion of negative prediction results are forcibly set as positive predictions (that is, setting "non - defective" as "defective") to reduce the impact of Type 1 overlooking. - This method is further improved. The overlooking probability is considered, and a new strategy is proposed. When the overlooking rate is low, the fixed prediction is stopped to avoid excessive increase in false - positive predictions. Through experimental verification, this method improves the recall while minimizing the negative impact on AUC, precision, and F1 - score.

Building Defect Prediction Models by Online Learning Considering Defect Overlooking

Software Defect Prediction by Online Learning Considering Defect Overlooking

Unifying Defect Prediction, Categorization, and Repair by Multi-Task Deep Learning

The Impact of Defect (Re) Prediction on Software Testing

An Improved Semi-Supervised Learning Method for Software Defect Prediction.

Studying the effectiveness of deep active learning in software defect prediction

Deep Learning for Just-In-Time Defect Prediction

An Empirical Study of the Impact of Test Strategies on Online Optimization for Ensemble-Learning Defect Prediction

Combined Classifier for Cross-Project Defect Prediction: an Extended Empirical Study.

Optimized Kernel-Based Conformal Predictor for Online Fault Detection

The Integrity of Machine Learning Algorithms against Software Defect Prediction

Effort-aware Just-in-time Defect Prediction: Simple Unsupervised Models Could Be Better Than Supervised Models.

A Software Defect Prediction Method That Simultaneously Addresses Class Overlap and Noise Issues after Oversampling

The Impact of Dormant Defects on Defect Prediction: A Study of 19 Apache Projects

Deep Learning-Based Defect Prediction for Mobile Applications

Understanding machine learning software defect predictions

An investigation of online and offline learning models for online Just-in-Time Software Defect Prediction

Towards An Online Incremental Approach to Predict Students Performance

A New Improved Prediction of Software Defects Using Machine Learning-based Boosting Techniques with NASA Dataset

Cross‐version Defect Prediction Using Threshold‐based Active Learning

Enhancing Defect Prediction with Static Defect Analysis.