12-1-2011 Learning the Structure of Gene Regulatory Networks From Time Series Gene Expression Data

Haoni Li,Nan Wang,P. Gong,E. Perkins,Chaoyang Zhang
2018-01-01
Abstract:Background: Dynamic Bayesian Network (DBN) is an approach widely used for reconstruction of gene regulatory networks from time-series microarray data. Its performance in network reconstruction depends on a structure learning algorithm. REVEAL (REVerse Engineering ALgorithm) is one of the algorithms implemented for learning DBN structure and used to reconstruct gene regulatory networks (GRN). However, the two-stage temporal Bayes network (2TBN) structure of DBN that specifies correlation between time slices cannot be obtained by score metrics used in REVEAL. Methods: In this paper, we study a more sophisticated score function for DBN first proposed by Nir Friedman for stationary DBNs structure learning of both initial and transition networks but has not yet been used for reconstruction of GRNs. We implemented Friedman’s Bayesian Information Criterion (BIC) score function, modified K2 algorithm to learn Dynamic Bayesian Network structure with the score function and tested the performance of the algorithm for GRN reconstruction with synthetic time series gene expression data generated by GeneNetWeaver and real yeast benchmark experiment data. Results: We implemented an algorithm for DBN structure learning with Friedman’s score function, tested it on reconstruction of both synthetic networks and real yeast networks and compared it with REVEAL in the absence or presence of preprocessed network generated by Zou&Conzen’s algorithm. By introducing a stationary correlation between two consecutive time slices, Friedman’s score function showed a higher precision and recall than the naive REVEAL algorithm. Conclusions: Friedman’s score metrics for DBN can be used to reconstruct transition networks and has a great potential to improve the accuracy of gene regulatory network structure prediction with time series gene expression datasets. Background High-content technologies such as DNA microarrays can provide a system-scale overview of how genes interact with each other in a network context. This network is called a gene regulatory network (GRN) and can be defined as a mixed graph over a set of nodes (corresponding to genes or gene activities) with directed or undirected edges (representing causal interactions or associations between gene activities) [1]. Various mathematical methods and computational approaches have been proposed to reconstruct GRNs, including Boolean networks [2], information theory [3,4], differential equations [5] and Bayesian networks [6-8]. GRN reconstruction faces huge intrinsic challenges on both experimental and theoretical fronts, because the inputs and outputs of the molecular processes are unclear and the underlying principles are unknown or too complex. In the previous work, we compared two important computational approaches, Dynamic Bayesian networks (DBNs) and Probabilistic Boolean networks for reconstructing GRNs using a time-series dataset from the Drosophila Interaction Database, and found that DBN * Correspondence: chaoyang.zhang@usm.edu School of Computing, University of Southern Mississippi, Hattiesburg, MS 39406, USA Full list of author information is available at the end of the article Li et al. BMC Genomics 2011, 12(Suppl 5):S13 http://www.biomedcentral.com/1471-2164/12/S5/S13 © 2011 Li et al. licensee BioMed Central Ltd This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. outperforms PBN [9]. In this paper, we emphasize the DBN approach. Dynamic Bayesian networks (DBNs) are belief networks that represent the stochastic process of a set of random variables over time. The hidden Markov model (HMM) and the Kalman filter can be considered as the simplest DBNs. However, Kalman filters can only handle unimodal posterior distributions and linear models, whereas parameterization of HMM grows exponentially with the number of state variables [10]. Several algorithms have been developed to learn structure for belief networks from both complete [6,10-12] (without missing values) and incomplete [13,14] (with missing values) datasets. Structure Expectation-Maximization (SEM) has been developed for learning Probabilistic network structure from data with hidden variables and missing values [13]. A structure learning algorithm has also been developed for high-order and non-stationary dynamic probabilistic models [15]. A commonly used structure learning algorithm is based on REVEAL (REVerse Engineering ALgorithm) [6,12] which learns the optimal set of parents for each node of a network independently, based on an information theoretic concept of mutual information analysis. However, the two-stage temporal Bayes network (2TBN) cannot be well recovered by application of REVEAL. In this work, we implemented a more sophisticated algorithm, proposed by Friedman [10], to learn the structure of both initial networks and transition networks, which specified a stationary correlation between two consecutive time periods. Compared with Murphy’s algorithm, it improves performance in two ways. First, in score function, it considers time lags that may happen in biological processes. Second, it fetches the relationship which gains the maximum score function in the same time period or in the two consecutive time periods. Thus, Friedman’s DBN structure learning algorithm was used in our work and its performance in terms of reconstruction accuracy was also evaluated using synthetic gene expression datasets and a real yeast time-series benchmark dataset. In the following sections, we first provide an introduction to DBN and existing DBN algorithms for reconstruction of GRNs. We then present an implementation of Friedman’s DBN algorithm. Finally, we apply the algorithms to synthetic datasets and a real yeast benchmark dataset, and compare its performance to the commonly used Murphy’s DBN algorithm [12,16] based on REVEAL.
What problem does this paper attempt to address?