Sparse Data Cleaning using Multiple Imputations

Sunghae Jun,Seung-Joo Lee,Kyung-Whan Oh
DOI: https://doi.org/10.5391/IJFIS.2004.4.1.119
2004-06-01
International Journal of Fuzzy Logic and Intelligent Systems
Abstract:Real data as web log file tend to be incomplete. But we have to find useful knowledge from these for optimal decision. In web log data, many useful things which are hyperlink information and web usages of connected users may be found. The size of web data is too huge to use for effective knowledge discovery. To make matters worse, they are very sparse. We overcome this sparse problem using Markov Chain Monte Carlo method as multiple imputations. This missing value imputation changes spare web data to complete. Our study may be a useful tool for discovering knowledge from data set with sparseness. The more sparseness of data in increased, the better performance of MCMC imputation is good. We verified our work by experiments using UCI machine learning repository data.
What problem does this paper attempt to address?