MapReduce-based Parallelized Approximation of Frequent Itemsets Mining in Uncertain Data.

Jing Xu,Xiao-Jiao Mao,Wen-Yang Lu,Qi-Hai Zhu,Ning Li,Yu-Bin Yang
DOI: https://doi.org/10.1007/978-3-319-26561-2_17
2015-01-01
Abstract:In recent years, frequent itemsets mining in uncertain data has drawn increasingly attractions from data mining communities. Currently, frequent itemsets mining algorithms in uncertain data mainly use frequent itemsets defined based on the expected support rather than the probabilistic support since the computational complexity is prohibitively high. To address this issue, various approximation algorithms for mining the probabilistic frequent itemsets have been proposed. However, the existing approximation algorithms are not adequately effective when the uncertain data is very large or extremely dense or sparse. In this paper, we propose a parallelized approximation algorithm, which is capable of mining probabilistic frequent itemsets on large-scale, dense or sparse uncertain data, based on the MapReduce platform. Experimental results are illustrated and analyzed to demonstrate the computational effectiveness of our algorithm.
What problem does this paper attempt to address?