Abstract:Background: Availability of linked biomedical and social science data has risen dramatically in past decades, facilitating holistic and systems-based analyses. Among these, Bayesian networks have great potential to tackle complex interdisciplinary problems, because they can easily model inter-relations between variables. They work by encoding conditional independence relationships discovered via advanced inference algorithms. One challenge is dealing with missing data, ubiquitous in survey or biomedical datasets. Missing data is rarely addressed in an advanced way in Bayesian networks; the most common approach is to discard all samples containing missing measurements. This can lead to biased estimates. Here, we examine how Bayesian network structure learning can incorporate missing data. Methods: We use a simulation approach to compare a commonly used method in frequentist statistics, multiple imputation by chained equations (MICE), with one specific for Bayesian network learning, structural expectation-maximization (SEM). We simulate multiple incomplete categorical (discrete) data sets with different missingness mechanisms, variable numbers, data amount, and missingness proportions. We evaluate performance of MICE and SEM in capturing network structure. We then apply SEM combined with community analysis to a real-world dataset of linked biomedical and social data to investigate associations between socio-demographic factors and multiple chronic conditions in the US elderly population. Results: We find that applying either method (MICE or SEM) provides better structure recovery than doing nothing, and SEM in general outperforms MICE. This finding is robust across missingness mechanisms, variable numbers, data amount and missingness proportions. We also find that imputed data from SEM is more accurate than from MICE. Our real-world application recovers known inter-relationships among socio-demographic factors and common multimorbidities. This network analysis also highlights potential areas of investigation, such as links between cancer and cognitive impairment and disconnect between self-assessed memory decline and standard cognitive impairment measurement. Conclusion: Our simulation results suggest taking advantage of the additional information provided by network structure during SEM improves the performance of Bayesian networks; this might be especially useful for social science and other interdisciplinary analyses. Our case study show that comorbidities of different diseases interact with each other and are closely associated with socio-demographic factors.

A Parallel Algorithm for Learning Bayesian Networks

Learning Bayesian Networks Using a Parallel EM Approach

Learning bayesian networks using domain knowledge: An empirical study

Improved Population-Based Incremental Learning of Bayesian Networks with partly known structure and parallel computing

A Parallel Algorithm for Bayesian Network Parameter Learning Based on Factor Graph

Scaling Bayesian Network Parameter Learning with Expectation Maximization using MapReduce

PEnBayes: A Multi-Layered Ensemble Approach for Learning Bayesian Network Structure from Big Data

Parallel Learning of Bayesian Networks Based on Ordering of Sets

Parallel structural learning of Bayesian networks: Iterative divide and conquer algorithm based on structural fusion

Learning Bayesian Networks From Data: An Efficient Approach Based On Extended Evolutionary Programming

A Comprehensively Improved Hybrid Algorithm for Learning Bayesian Networks: Multiple Compound Memory Erasing

A Parallel Algorithm for Exact Bayesian Structure Discovery in Bayesian Networks

A Learning Algorithm for Bayesian Networks and Its Efficient Implementation on GPUs

Collective Approach for Bayesian Network Learning from Distributed Heterogeneous Database

MapReduce for Bayesian Network Parameter Learning using the EM Algorithm

An Efficient Procedure for Computing Bayesian Network Structure Learning

Hybrid Parrallel Bayesian Network Structure Learning from Massive Data Using MapReduce

Treatment of missing data in Bayesian network structure learning: an application to linked biomedical and social survey data

Efficient heuristics for learning scalable Bayesian network classifier from labeled and unlabeled data

Learning Bayesian networks from big data with greedy search: computational complexity and efficient implementation

A New Method Of Learning Bayesian Networks Structures From Incomplete Data