Missing Value Imputation via Clusterwise Linear Regression

Napsu Karmitsa,Sona Taheri,Adil Bagirov,Pauliina Makinen
DOI: https://doi.org/10.1109/tkde.2020.3001694
IF: 9.235
2020-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:In this paper a new method of preprocessing incomplete data is introduced. The method is based on clusterwise linear regression and it combines two well-known approaches for missing value imputation: linear regression and clustering. The idea is to approximate missing values using only those data points that are somewhat similar to the incomplete data point. A similar idea is used also in clustering based imputation methods. Nevertheless, here the linear regression approach is used within each cluster to accurately predict the missing values, and this is done simultaneously to clustering. The proposed method is tested using some synthetic and real-world data sets and compared with other algorithms for missing value imputations. Numerical results demonstrate that this method produces the most accurate imputations in MCAR and MAR data sets with a clear structure and the percentages of missing data no more than 25 percent.
computer science, information systems, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?