Building Defect Prediction Models by Online Learning Considering Defect Overlooking

Nikolay Fedorov,Yuta Yamasaki,Masateru Tsunoda,Akito Monden,Amjed Tahir,Kwabena Ebo Bennin,Koji Toda,Keitaro Nakasai
2024-04-17
Abstract:Building defect prediction models based on online learning can enhance prediction accuracy. It continuously rebuilds a new prediction model, when a new data point is added. However, a module predicted as "non-defective" can result in fewer test cases for such modules. Thus, a defective module can be overlooked during testing. The erroneous test results are used as learning data by online learning, which could negatively affect prediction accuracy. To suppress the negative influence, we propose to apply a method that fixes the prediction as positive during the initial stage of online learning. Additionally, we improved the method to consider the probability of the overlooking. In our experiment, we demonstrate this negative influence on prediction accuracy, and the effectiveness of our approach. The results show that our approach did not negatively affect AUC but significantly improved recall.
Software Engineering
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: when using the online learning method to construct a software defect prediction model, the problem of decreased prediction accuracy due to defect overlooking. Specifically: 1. **The impact of defect overlooking**: - When a module is predicted as "non - defective", developers usually write fewer test cases for such modules in order to allocate test resources efficiently. This may lead to the actual defective modules being overlooked in the test, thus affecting the accuracy of the learning data of the prediction model. - Such wrong test results, as data for online learning, may have a negative impact on the accuracy of the prediction model. 2. **Two types of defect overlooking**: - **Type 1 overlooking**: When the prediction result is "non - defective", the actually existing defects may be overlooked due to fewer test cases. - **Type 2 overlooking**: Even if the prediction result is "defective" and a large number of tests are carried out, there may still be undetected defects. 3. **Solutions**: - The paper proposes a fixed prediction method, that is, in the initial stage of online learning, a certain proportion of negative prediction results are forcibly set as positive predictions (that is, setting "non - defective" as "defective") to reduce the impact of Type 1 overlooking. - This method is further improved. The overlooking probability is considered, and a new strategy is proposed. When the overlooking rate is low, the fixed prediction is stopped to avoid excessive increase in false - positive predictions. Through experimental verification, this method improves the recall while minimizing the negative impact on AUC, precision, and F1 - score.