Abstract:In this dissertation we concentrate on learning Bayesian Networks (BN) from distributed heterogeneous databases. We need to develop distributed techniques that save communication overhead, offer better scalability, and require minimal communication of possibly secure data. The objective of this work is to learn a collective BN from data that is distributed among geographically diverse sites. The data distribution is heterogeneous. The collective BN must be close to a BN learned by a centralized method and must require only a small amount of data transmission among different sites. In general, the collective learning algorithms have four steps: local learning, sample selection, cross learning, and combination. The key points in the proposed methods are: (1) use the BN decomposability property; (2) identify the samples that are most likely to be evidence of cross terms. We show that low-likelihood samples in each site are most likely to be the evidence of cross terms. One collective structure learning and two collective parameter learning methods are proposed. For structure learning, the collective method can find the correct structure of local variables by choosing a base structure learning algorithm with the decomposability property. Some extra links may be introduced due to the hidden variable problem. Sample selection chooses low-likelihood samples in local sites and transmits them to a central site. In cross learning, the structure of cross variables and cross set are identified. In combination, we add all cross links and remove extra local links. For parameter learning, Collective Method 1 (CM1) and Collective Method 2 (CM2) can learn a BN which is close to Bcntr using a small portion of samples. Local learning learns parameters for local variables. Cross learning learns the parameters of cross variables. The combination step aggregates the parameters of local variables and cross variables. In order to handle applications with real-time constraints, we have developed CM2. Using a notion of cross set, CM2 chooses a subset of features in a local site to do the likelihood computation and data selection. This can greatly reduce the local computation and the data transmission overhead. Experimental results demonstrate the efficiency and accuracy of these methods.

Learning Bayesian Network Structure from Distributed Data

Collective Approach for Bayesian Network Learning from Distributed Heterogeneous Database

Learning bayesian networks using domain knowledge: An empirical study

Distributed Learning of Predictive Structures from Multiple Tasks over Networks

Bayesian Discovery of Multiple Bayesian Networks via Transfer Learning

Bayesian learning of multiple directed networks from observational data

PEnBayes: A Multi-Layered Ensemble Approach for Learning Bayesian Network Structure from Big Data

Local Structure Discovery in Bayesian Networks

A survey of Bayesian Network structure learning

Structure Learning for Hybrid Bayesian Networks

A Method for Hybrid Bayesian Network Structure Learning from Massive Data Using MapReduce

A Multi-Granularity Information-Based Method for Learning High-Dimensional Bayesian Network Structures

Improved Population-Based Incremental Learning of Bayesian Networks with partly known structure and parallel computing

A New Method Of Learning Bayesian Networks Structures From Incomplete Data

Asynchronous Local Computations in Distributed Bayesian Learning

LSBN: A Large-Scale Bayesian Structure Learning Framework for Model Averaging

Collaborative Learning by Boosting in Distributed Environments

Hybrid Parrallel Bayesian Network Structure Learning from Massive Data Using MapReduce

Learning Bayesian Network Structure from Small Data Set

Learning Bayesian Networks: A Copula Approach for Mixed-Type Data

Asynchronous Bayesian Learning over a Network