The untold impact of learning approaches on software fault-proneness predictions: an analysis of temporal aspects

Lutz, Robyn R.
DOI: https://doi.org/10.1007/s10664-024-10454-8
IF: 3.762
2024-06-09
Empirical Software Engineering
Abstract:This paper aims to improve software fault-proneness prediction by investigating the unexplored effects on classification performance of the temporal decisions made by practitioners and researchers regarding (i) the interval for which they will collect longitudinal features (software metrics data), and (ii) the interval for which they will predict software bugs (the target variable). We call these specifics of the data used for training and of the target variable being predicted the learning approach , and explore the impact of the two most common learning approaches on the performance of software fault-proneness prediction, both within a single release of a software product and across releases. The paper presents empirical results from a study based on data extracted from 64 releases of twelve open-source projects. Results show that the learning approach has a substantial, and typically unacknowledged, impact on classification performance. Specifically, we show that one learning approach leads to significantly better performance than the other, both within-release and across-releases. Furthermore, this paper uncovers that, for within-release predictions, the difference in classification performance is due to different levels of class imbalance in the two learning approaches. Our findings show that improved specification of the learning approach is essential to understanding and explaining the performance of fault-proneness prediction models, as well as to avoiding misleading comparisons among them. The paper concludes with some practical recommendations and research directions based on our findings toward improved software fault-proneness prediction.
computer science, software engineering
What problem does this paper attempt to address?