Efficient Probabilistic Frequent Itemset Mining In Big Sparse Uncertain Data

Jing Xu,Ning Li,Xiao-Jiao Mao,Yu-Bin Yang
DOI: https://doi.org/10.1007/978-3-319-13560-1_19
2014-01-01
Abstract:Probabilistic frequent itemset (PFI) mining in uncertain data has been drawing increasing attention from data mining communities recently. However, data generated in network environments, such as machine logs and retail transactions, tends to be big, sparse and uncertain due to the influence of random factors including unavoidable network latency, unfaithful collection and unreliable transmission, etc. Therefore, most available PFI mining algorithms are not adequately effective on dealing with uncertain data which is greatly big and extremely sparse. To address this issue, we propose a novel tree structure, ApproxFP-Tree and a parallelized ApproxFP algorithm based on the MapReduce platform aiming to mine all PFIs in big, sparse and uncertain data efficiently. Experimental results on real-world and synthetic databases are illustrated and analyzed to show that our approach is significantly efficient than the state-of-the-art algorithms.
What problem does this paper attempt to address?