Data-Intensive Learning Of Uncertain Knowledge

Kun Yue,Weiyi Liu,Hao Wu,Dapeng Tao,Ming Gao
2018-01-01
Abstract:In this chapter, we propose a parallel and incremental approach for data-intensive learning of BN from massive, distributed and dynamically changing data by extending the classic scoring & search algorithm and using MapReduce. First, we adopt the minimum description length (MDL) as the scoring metric and give a two-pass MapReduce-based algorithm for computing required marginal probabilities and scoring candidate graphical models upon sample data. Then, we give the corresponding strategy for extending the classic hill-climbing algorithm to obtain the optimal structure, as well as that for storing a BayesianNetwork (BN) by < key, value > pairs. Further in view of the dynamic characteristics of changing data, we give the concept of influence degree tomeasure the coincidence of current BN with new data, and then propose a two-pass MapReduce-based algorithm for BN's incremental learning. Experimental results show the efficiency, scalability and effectiveness of our methods.
What problem does this paper attempt to address?