iDMI: A novel technique for missing value imputation using a decision tree and expectation-maximization algorithm

Md. Geaur Rahman,M. Islam
DOI: https://doi.org/10.1109/ICCITECHN.2014.6997351
2014-03-08
Abstract:In this paper we present a novel technique called iDMI that imputes missing values of a data set by combining a decision tree algorithm (DT) and an expectation-maximization (EMI) algorithm. We first divide a data set into horizontal segments through applying a DT algorithm such as C4.5, and then apply an EMI algorithm on each segment in order to impute the missing values belong to the segment. If all numerical attribute values of a record are missing then we impute them by the mean values of the attributes of the records belong to a segment where the record falls in, and thereby reduce the computational time complexity of iDMI compare to an existing technique called DMI which calculate the mean value of an attribute by using all records of a data set. We evaluate the performance of iDMI over three high quality existing techniques on two real data sets in terms of four evaluation criteria. Our initial experimental results, including several statistical significance analysis, indicate the superiority of iDMI over the existing techniques.
Computer Science
What problem does this paper attempt to address?