Abstract:Understanding gene regulation is an important step to understanding how essential mechanisms are controlled in biological systems. One of the central goals of biology is to identify which transcription factors (TFs) regulate the transcription of which target genes and what are their downstream effects. Functional assays such as ChIP-seq and DNase I together can provide a TF binding map of TF binding sites on DNA. However, the binding alone may not result in changing the target gene expression. Thus, functional validation is necessary to show that the binding influences target gene expression. The standard approach to functional validation is to perform artificial TF knockdown experiments and declare the differentially expressed genes as validated target genes. Instead of artificial perturbation, we propose to leverage the naturally-occurring genetic variations as the source of perturbations that vary gene expressions and to analyze population SNP and geneexpression data in order to validate the TF binding map. Compared to the standard approach that perturbs TF concentration for a single TF at a time, our approach is potentially more powerful, because any aspects of the TF-target interaction, including TF concentration and TF binding affinity, can be perturbed by a large number of SNPs found across the genome. In addition, we are able to leverage existing SNP and gene expression data, which is available from the popular expression quantitative trait locus mapping studies. We introduce a statistical approach, based on conditional Gaussian Bayesian networks, that integrates population SNP and gene expression data with TF binding data to validate the TF binding map. We develop an efficient learning algorithm for learning the gene regulatory network by using the TF binding data as prior knowledge and selecting the TF-target interactions that are validated based on population SNP and gene-expression data. Given the estimated network, we perform inference on the estimated probabilistic graphical models to determine downstream genes that are affected by the TF-target interactions. We demonstrate our method on ENCODE ChIP-seq and DNase I data, and on population SNP and expression data from lymphoblastoid cells, originally collected for the 1000 Genomes and HapMap 3 projects respectively. Finally, we apply our approach to validate the TF binding map of ER and its coregulators in breast cancer using ENCODE ChIP-seq and DNase I data, and population SNP and expression data from the TCGA project.

A Bayesian data fusion based approach for learning genome-wide transcriptional regulatory networks

Functional Inference of Gene Regulation Using Single-Cell Multi-Omics

Network Reconstruction for Trans Acting Genetic Loci Using Multi-Omics Data and Prior Information

Identifying targets of multiple co-regulating transcription factors from expression time-series by Bayesian model comparison

Bayesian joint modeling of multiple gene networks and diverse genomic data to identify target genes of a transcription factor

Reconstructing Gene Regulatory Networks with Bayesian Networks by Combining Expression Data with Multiple Sources of Prior Knowledge

Gene regulatory network inference using fused LASSO on multiple data sets

Integration of Epigenetic Data in Bayesian Network Modeling of Gene Regulatory Network

Reverse engineering highlights potential principles of large gene regulatory network design and learning

A computational framework for gene regulatory network inference that combines multiple methods and datasets

Fused Regression for Multi-source Gene Regulatory Network Inference

Dynamic Bayesian Network Approach for Modeling Gene Regulatory Networks

Bayesian variable selection and data integration for biological regulatory networks

Reverse engineering gene regulatory networks using approximate Bayesian computation

Understanding Distal Transcriptional Regulation from Sequence Motif, Network Inference and Interactome Perspectives

Bayesian non-negative factor analysis for reconstructing transcriptional regulatory network

Combining gene expression data and prior knowledge for inferring gene regulatory networks via Bayesian networks using structural restrictions

A graphical model approach visualizes regulatory relationships between genome-wide transcription factor binding profiles

Transcriptional regulatory network refinement and quantification through kinetic modeling, gene expression microarray data and information theory

Functional Validation of Transcription Factor to Gene Interactions by Statistical Learning of Gaussian Bayesian networks from SNP and Expression data .

Prediction of regulatory networks: genome-wide identification of transcription factor targets from gene expression data