The Impact of Defect (Re) Prediction on Software Testing

Yukasa Murakami,Yuta Yamasaki,Masateru Tsunoda,Akito Monden,Amjed Tahir,Kwabena Ebo Bennin,Koji Toda,Keitaro Nakasai
2024-09-10
Abstract:Cross-project defect prediction (CPDP) aims to use data from external projects as historical data may not be available from the same project. In CPDP, deciding on a particular historical project to build a training model can be difficult. To help with this decision, a Bandit Algorithm (BA) based approach has been proposed in prior research to select the most suitable learning project. However, this BA method could lead to the selection of unsuitable data during the early iteration of BA (i.e., early stage of software testing). Selecting an unsuitable model can reduce the prediction accuracy, leading to potential defect overlooking. This study aims to improve the BA method to reduce defects overlooking, especially during the early testing stages. Once all modules have been tested, modules tested in the early stage are re-predicted, and some modules are retested based on the re-prediction. To assess the impact of re-prediction and retesting, we applied five kinds of BA methods, using 8, 16, and 32 OSS projects as learning data. The results show that the newly proposed approach steadily reduced the probability of defect overlooking without degradation of prediction accuracy.
Software Engineering
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to reduce the defect omission problem in the early testing phase in cross - project defect prediction (CPDP). Specifically, the paper points out: 1. **Challenges in cross - project defect prediction (CPDP)**: In CPDP, it is a difficult problem to select appropriate external project data as the data source for training the model. If the selection is inappropriate, it may lead to a decline in the accuracy of the prediction model, thereby increasing the risk of defect omission. 2. **Selection problem of Bandit algorithm (BA)**: Previous research has proposed methods based on the Bandit algorithm to select the most suitable learning projects. However, in the early stage of software testing, the Bandit algorithm may select inappropriate data, resulting in inaccurate prediction and then causing defect omission. 3. **Risk of defect omission in the early testing stage**: In the early stage of software testing, due to improper model selection, defective modules may be misjudged as non - defective, thereby reducing the testing intensity of these modules and resulting in defects being ignored. To address these problems, the paper proposes a new method, that is, reducing defect omission in the early testing stage through re - prediction and re - testing. Specific steps include: - After all modules are tested, re - predict the modules tested in the early stage. - According to the results of re - prediction, select some modules for re - testing to ensure that no important defects are omitted. Through this method, the paper aims to improve the selection accuracy of the Bandit algorithm, especially in the early testing stage, thereby reducing the possibility of defect omission while maintaining the accuracy of prediction. ### Formula representation The following formulas are used in the paper to evaluate the effects of different methods: - **Relative difference (RDIFF)**: \[ RDIFF(\alpha,\beta)=\frac{\text{criterion of }\beta-\text{criterion of }\alpha}{\text{criterion of }\alpha}-1 \] - **Absolute difference (DIFF)**: \[ DIFF(\alpha,\beta)=\text{criterion of }\beta-\text{criterion of }\alpha \] Here, \(\alpha\) and \(\beta\) represent different methods or models respectively. ### Experimental results The experimental results show that the method of re - prediction and re - testing significantly increases the number of discovered defects, and in most cases does not reduce the AUC (Area Under Curve) value. This indicates that the new method is effective in reducing defect omission. ### Conclusion The re - prediction and re - testing method proposed in the paper can effectively reduce defect omission in the early testing stage, and is especially suitable for cross - project defect prediction scenarios. Although this method may increase the workload of re - testing, it can significantly reduce the probability of defect omission, thereby improving software quality.