An Empirical Analysis of Three-Stage Data-Preprocessing for Analogy-Based Software Effort Estimation on the ISBSG Data.

Jianglin Huang,Yan-Fu Li,Jacky Wai Keung,Y. T. Yu,W. K. Chan
DOI: https://doi.org/10.1109/qrs.2017.54
2017-01-01
Abstract:Analogy-based software effort estimation is a method to estimate the project cost of an unseen project based on analogies against previous projects sharing selected features. The validity of the selected features depends on many factors, and one of most crucial factors is the effectiveness of the data-preprocessing techniques applied to the datasets of the previous projects. In this paper, we report the first controlled experiment that studies the class of three-stage data-preprocessing techniques with stages of missing data imputation, data normalization, and feature selection for analogy-based effort estimation. We conducted our investigation on the ISBSG data. The experimental results show that three-stage data-preprocessing techniques have significant impacts on the resultant effort estimation accuracy. The results also indicate that the combined use of Z-Score normalization, kNN imputation and mutual information based feature weighting can be an effective choice for analogy-based effort estimation.
What problem does this paper attempt to address?