Predicting Vulnerable Components via Text Mining or Software Metrics? An Effort-Aware Perspective

Yaming Tang,Fei Zhao,Yibiao Yang,Hongmin Lu,Yuming Zhou,Baowen Xu
DOI: https://doi.org/10.1109/QRS.2015.15
2015-01-01
Abstract:In order to identify vulnerable software components, developers can take software metrics as predictors or use text mining techniques to build vulnerability prediction models. A recent study reported that text mining based models have higher recall than software metrics based models. However, this conclusion was drawn without considering the sizes of individual components which affects the code inspection effort to determine whether a component is vulnerable. In this paper, we investigate the predictive power of these two kinds of prediction models in the context of effort-aware vulnerability prediction. To this end, we use the same data sets, containing 223 vulnerabilities found in three web applications, to build vulnerability prediction models. The experimental results show that: (1) in the context of effort-aware ranking scenario, text mining based models only slightly outperform software metrics based models, (2) in the context of effort-aware classification scenario, text mining based models perform similarly to software metrics based models in most cases, and (3) most of the effect sizes (i.e. the magnitude of the differences) between these two kinds of models are trivial. These results suggest that, from the viewpoint of practical application, software metrics based models are comparable to text mining based models. Therefore, for developers, software metrics based models are practical choices for vulnerability prediction, as the cost to build and apply these models is much lower.
What problem does this paper attempt to address?