A Hierarchical Missing Value Imputation Method By Correlation-Based K-Nearest Neighbors

Xin Liu,Xiaochen Lai,Liyong Zhang
DOI: https://doi.org/10.1007/978-3-030-29516-5_38
2020-01-01
Abstract:Missing value is a common occurrence in the real-world dataset, and many methods have been proposed to solve it. Among those methods, KNN imputation attracts a lot of attention due to the simple realization, easy understanding, and relatively high accuracy. However, it ignores the influence of correlations between attributes on the similarity of records. In this paper, we take the correlations into consideration when selecting the nearest neighbors, and impute the incomplete records successively according to the number of missing values in each record. During the imputation, the correlation coefficients are calculated by the complete records and updated with the union of complete records and imputed records. Therefore, the correlations between attributes are more accurate with the improvement of data utilization, which makes the selected nearest neighbors more appropriate. Experimental results demonstrate that the improved method is more effective in missing value imputation.
What problem does this paper attempt to address?