Efficient Utilization of Missing Data in Cost-Sensitive Learning
Xiaofeng Zhu,Jianye Yang,Chengyuan Zhang,Shichao Zhang
DOI: https://doi.org/10.1109/tkde.2019.2956530
IF: 9.235
2021-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:Different from previous imputation methods which impute missing values in the incomplete samples by using the information in the complete samples, this paper proposes a Date-drive Incremental imputation Model, DIM for short, which uses all available information in the data set to impute missing values economically, effectively, orderly, and iteratively. To this end, we propose a scoring rule to rank the missing features by taking into account both the economical criterion and the effective imputation information. The economical criterion takes both the imputation cost and the discriminative ability of the feature into account, while the effective imputation information enables to use all observed information in the data set including the imputed missing values to impute the left missing values. During the imputation process, our DIM first detects the neednot-impute samples for reducing the imputation cost and noise, and then selects the missing features with the top rank to impute first. The imputation process orderly imputes the missing features until all missing values are imputed or the imputation cost is exhausted. Experimental results on UCI data sets demonstrated the advantages of our proposed DIM, compared to the comparison methods, in terms of prediction accuracy and classification accuracy.