Learning Bayesian Network Structure from Distributed Data

Rong Chen,K. Sivakumar,H. Khargupta
DOI: https://doi.org/10.1137/1.9781611972733.31
2003-01-01
Abstract:We propose a collective method to address the problem of learning the structure of a Bayesian network from a distributed heterogeneous data sources. In this case, the dataset is distributed among several sites, with different features at each site. The collective method has four steps: local learning, sample selection, cross learning, and combination of the results. The parents of local nodes can be correctly identified in local learning. The main task of cross learning is to identify the links whose vertices are in different sites (cross links). This is done by transmitting a small subset of samples from each local site to a central site. The combination step involves removing extra links from local Bayesian networks that may be introduced during local learning due to the well known hidden variable problem. The sample selection step selects samples, based on a likelihood criterion, that are possibly evidence of cross links. The overall procedure is called collective learning. Experimental results verify that, for sparsely connected networks, the collective learning method can learn the same structure as that obtained by a centralized learning method (which simply aggregates data from all local sites into a
Computer Science
What problem does this paper attempt to address?