An Empirical Study of Dynamic Incomplete-Case Nearest Neighbor Imputation in Software Quality Data.

Jianglin Huang,Hongyi Sun,Yan-Fu Li,Min Xie
DOI: https://doi.org/10.1109/qrs.2015.16
2015-01-01
Abstract:Software quality prediction is an important yet difficult problem in software project development and management. Historical datasets can be used to build models for software quality prediction. However, the missing data significantly affects the prediction ability of models in knowledge discovery. Instead of ignoring missing observations, we investigate and improve incomplete-case k-nearest neighbor based imputation. K-nearest neighbor imputation is widely applied but has rarely been improved to have the most appropriate parameter settings for each imputation. This work conducts imputation on four well-known software quality datasets to discover the impact of the new imputation method we proposed. We compare it with mean imputation and other commonly used versions of k-nearest neighbor imputation. The empirical results show that the proposed dynamic incomplete-case nearest neighbor imputation performs better when the missingness is completely at random or non-ignorable, regardless of the percentage of missing values.
What problem does this paper attempt to address?