Improving high-impact bug report prediction with combination of interactive machine learning and active learning

Xiaoxue Wu,Wei Zheng,Xiang Chen,Yu Zhao,Tingting Yu,Dejun Mu
DOI: https://doi.org/10.1016/j.infsof.2021.106530
IF: 3.9
2021-05-01
Information and Software Technology
Abstract:<h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Context:</h3><p>Bug reports record issues found during software development and maintenance. A high-impact bug report (HBR) describes an issue that can cause severe damage once occurred after deployment. Identifying HBRs from the bug repository as early as possible is crucial for guaranteeing software quality.</p><h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Objective:</h3><p>In recent years, many machine learning-based approaches have been proposed for HBR prediction, and most of them are based on supervised machine learning. However, the assumption of supervised machine learning is that it needs a large number of labeled data, which is often difficult to gather in practice.</p><h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Method:</h3><p>In this paper, we propose hbrPredictor, which combines interactive machine learning and active learning to HBR prediction. On the one hand, it can dramatically reduce the number of bug reports required for prediction model training; on the other hand, it improves the diversity and generalization ability of training samples via <em>uncertainty sampling</em>.</p><h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Result:</h3><p>We take security bug report (SBR) prediction as an example of HBR prediction and perform a large-scale experimental evaluation on datasets from different open-source projects. The results show: (1) hbrPredictor substantially outperforms the two baselines and obtains the maximum values of F1-score (0.7939) and AUC (0.8789); (2) with the dynamic stop criteria, hbrPredictor could reach its best performance with only 45% and 13% of the total bug reports for small-sized datasets and large-sized datasets, respectively.</p><h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Conclusion:</h3><p>By reducing the number of required training samples, hbrPredictor could substantially save the data labeling effort without decreasing the effectiveness of the model.</p>
computer science, information systems, software engineering
What problem does this paper attempt to address?