Mining all publicly available expression data to compute dynamic microbial transcriptional regulatory networks

Anand V. Sastry,Saugat Poudel,Kevin Rychel,Reo Yoo,Cameron R. Lamoureux,Siddharth Chauhan,Zachary B. Haiman,Tahani Al Bulushi,Yara Seif,Bernhard O. Palsson
DOI: https://doi.org/10.1101/2021.07.01.450581
2021-07-02
Abstract:Abstract We are firmly in the era of biological big data. Millions of omics datasets are publicly accessible and can be employed to support scientific research or build a holistic view of an organism. Here, we introduce a workflow that converts all public gene expression data for a microbe into a dynamic representation of the organism’s transcriptional regulatory network. This five-step process walks researchers through the mining, processing, curation, analysis, and characterization of all available expression data, using Bacillus subtilis as an example. The resulting reconstruction of the B. subtilis regulatory network can be leveraged to predict new regulons and analyze datasets in the context of all published data. The results are hosted at https://imodulondb.org/ , and additional analyses can be performed using the PyModulon Python package. As the number of publicly available datasets increases, this pipeline will be applicable to a wide range of microbial pathogens and cell factories.
What problem does this paper attempt to address?