Learning Markov Blanket Bayesian Network for Big Data in MapReduce.

Yuxin Che,Shaohui Hong,Defu Zhang,Liming Zhang
DOI: https://doi.org/10.1109/ictai.2016.0138
2016-01-01
Abstract:A challenge task of data mining is to process massive data in the big data era. MapReduce is an attractive model to overcome this challenge. This paper presents a new method to accelerate the process of learning Markov blanket Bayesian network(MBBN). Markov blanket is a better model type of Bayesian network in some complex datasets. The time and space cost of learning Markov blanket is large, and grows fast as the variables increase. Large amounts of data are needed for its independence test which makes the problem harder. The statistical phase and independence test are parallelized to make it find an appropriate relation among variables in the MapReduce framework. Computational results are reported by testing four datasets and show that the speed-up can be obtained by means of MapReduce. In particular, the Markov blanket in MapReduce has higher accuracy rate than naïve Bayesian and tree-augmented naïve Bayesian.
What problem does this paper attempt to address?