Understanding the value of considering client usage context in package cohesion for fault-proneness prediction

Yangyang Zhao,Yibiao Yang,Hongmin Lu,Jinping Liu,Hareton Leung,Yansong Wu,Yuming Zhou,Baowen Xu
DOI: https://doi.org/10.1007/s10515-016-0198-6
IF: 1.677
2016-03-28
Automated Software Engineering
Abstract:By far, many package cohesion metrics have been proposed from internal structure view and external usage view. Based on whether client usage context (i.e., the way packages are used by their clients) is exploited, we group these metrics into two categories: non-context-based and context-based. Currently, there is no comprehensive empirical research devoted to understanding the actual value of client usage context for fault-proneness prediction. In this study, we conduct a thorough empirical study to investigate the value of considering client usage context in package cohesion for fault-proneness prediction. First, we use principal component analysis to examine the relationships between context-based and non-context-based cohesion metrics. Second, we employ univariate logistic regression analysis to investigate the correlation between context-based cohesion metrics and fault-proneness. Then, we build multivariate prediction models to analyze the ability of context-based cohesion metrics for fault-proneness prediction when used alone or used together with non-context-based cohesion metrics. To obtain comprehensive evaluations, we evaluate the effectiveness of these multivariate models in the ranking and classification scenarios from both cross-validation and across-version perspectives. The experimental results show that (1) context-based cohesion metrics are complementary to non-context-based cohesion metrics; (2) most of context-based cohesion metrics have a significantly negative association with fault-proneness; (3) when used alone or used together with non-context-based cohesion metrics, context-based cohesion metrics can substantially improve the effectiveness of fault-proneness prediction in most studied systems under both cross-validation and across-version evaluation. Client usage context has an important value in package cohesion for fault-proneness prediction.
computer science, software engineering
What problem does this paper attempt to address?