Categorical Missing Data Imputation Approach Via Sparse Representation.

Xiaochen Shao,Sen Wu,Xiaodong Feng,Rui Song
DOI: https://doi.org/10.1504/ijstm.2016.078542
2016-01-01
International Journal of Services Technology and Management
Abstract:K-nearest neighbour (KNN) is an important method for imputation of categorical missing data. The effectiveness of KNN is highly sensitive to some local parameters such as the choice of similarity function and number of neighbours. Aimed at solving these two issues, a categorical missing data imputation algorithm (CSR) is proposed. It firstly conducts matrix transform to make categorical data more complied with calculation. Then it introduces locality constraint thought to sparse representation theory by using KNN as dictionary construction. After that, this method gets weight vectors for each missing instance with smoothness and local structure feature. Lastly, the algorithm selects the maximal corresponding reconstruction value of each missing attribute to fill up the missing data by using the sparse reconstruction coefficient vector. Empirical tests show that CSR outperforms KNNimpute (including its two derivative methods IKNNimpute, SKNNimpute) and LLSimpute from the view of efficiency and stability.
What problem does this paper attempt to address?