Abstract:Background Large-scale sequencing of entire genomes has ushered in a new age in biology. One of the next grand challenges is to dissect the cellular networks consisting of many individual functional modules. Defining co-expression networks without ambiguity based on genome-wide microarray data is difficult and current methods are not robust and consistent with different data sets. This is particularly problematic for little understood organisms since not much existing biological knowledge can be exploited for determining the threshold to differentiate true correlation from random noise. Random matrix theory (RMT), which has been widely and successfully used in physics, is a powerful approach to distinguish system-specific, non-random properties embedded in complex systems from random noise. Here, we have hypothesized that the universal predictions of RMT are also applicable to biological systems and the correlation threshold can be determined by characterizing the correlation matrix of microarray profiles using random matrix theory. Results Application of random matrix theory to microarray data of S. oneidensis , E. coli , yeast, A. thaliana , Drosophila , mouse and human indicates that there is a sharp transition of nearest neighbour spacing distribution (NNSD) of correlation matrix after gradually removing certain elements insider the matrix. Testing on an in silico modular model has demonstrated that this transition can be used to determine the correlation threshold for revealing modular co-expression networks. The co-expression network derived from yeast cell cycling microarray data is supported by gene annotation. The topological properties of the resulting co-expression network agree well with the general properties of biological networks. Computational evaluations have showed that RMT approach is sensitive and robust. Furthermore, evaluation on sampled expression data of an in silico modular gene system has showed that under-sampled expressions do not affect the recovery of gene co-expression network. Moreover, the cellular roles of 215 functionally unknown genes from yeast, E. coli and S. oneidensis are predicted by the gene co-expression networks using guilt-by-association principle, many of which are supported by existing information or our experimental verification, further demonstrating the reliability of this approach for gene function prediction. Conclusion Our rigorous analysis of gene expression microarray profiles using RMT has showed that the transition of NNSD of correlation matrix of microarray profile provides a profound theoretical criterion to determine the correlation threshold for identifying gene co-expression networks.

Deciphering eukaryotic gene-regulatory logic with 100 million random promoters

Deciphering the regulatory genome of $\textit{Escherichia coli}$, one hundred promoters at a time

Deciphering the cis-regulatory landscape of natural yeast Transcript Leaders

Wide-Scale Analysis of Human Functional Transcription Factor Binding Reveals a Strong Bias towards the Transcription Start Site

Deciphering regulatory architectures from synthetic single-cell expression patterns

Constructing Gene Co-Expression Networks and Predicting Functions of Unknown Genes by Random Matrix Theory

Machine learning for regulatory analysis and transcription factor target prediction in yeast

Decoding transcriptional regulation via a human gene expression predictor

Genome-wide regulatory complexity in yeast promoters: separation of functionally conserved and neutral sequence.

Unraveling determinants of transcription factor binding outside the core binding site

Accurate prediction of gene expression by integration of DNA sequence statistics with detailed modeling of transcription regulation

Understanding Transcriptional Regulation by Integrative Analysis of Transcription Factor Binding Data

Compatibility rules of human enhancer and promoter sequences

A unified-field theory of genome organization and gene regulation

Understanding Distal Transcriptional Regulation from Sequence Motif, Network Inference and Interactome Perspectives

Deciphering transcriptional dynamics in vivo by counting nascent RNA molecules

Correlating overrepresented upstream motifs to gene expression: a computational approach to regulatory element discovery in eukaryotes

The evolution, evolvability and engineering of gene regulatory DNA

The regulatory grammar of human promoters uncovered by MPRA-trained deep learning

A high-throughput synthetic biology approach for studying combinatorial chromatin-based transcriptional regulation

Determination and Inference of Eukaryotic Transcription Factor Sequence Specificity