Machine learning analysis of RB-TnSeq fitness data predicts functional gene modules in Pseudomonas putida KT2440

Andrew J. Borchert,Alissa C. Bleem,Hyun Gyu Lim,Kevin Rychel,Keven D. Dooley,Zoe A. Kellermyer,Tracy L. Hodges,Bernhard O. Palsson,Gregg T. Beckham
DOI: https://doi.org/10.1128/msystems.00942-23
2024-02-07
mSystems
Abstract:This study demonstrates a rapid, automated approach for elucidating functional modules within complex genetic networks. While Pseudomonas putida randomly barcoded transposon insertion sequencing data were used as a proof of concept, this approach is applicable to any organism with existing functional genomics data sets and may serve as a useful tool for many valuable applications, such as guiding metabolic engineering efforts in other microbes or understanding functional relationships between virulence-associated genes in pathogenic microbes. Furthermore, this work demonstrates that comparison of data obtained from independent component analysis of transcriptomics and gene fitness datasets can elucidate regulatory-functional relationships between genes, which may have utility in a variety of applications, such as metabolic modeling, strain engineering, or identification of antimicrobial drug targets.
microbiology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to analyze the RB - TnSeq fitness data of Pseudomonas putida KT2440 through machine - learning methods in order to predict its functional gene modules. Specifically, the researchers applied independent component analysis (ICA) to process the RB - TnSeq data obtained from 179 different experimental conditions, in which transposons were randomly inserted into the Pseudomonas putida KT2440 strain. Through ICA, the researchers hope to be able to identify groups of genes with common functional impacts, the so - called "functional modules" (fModules), thereby accelerating the annotation of gene functions and reducing the time required for manual curation of RB - TnSeq datasets. The main objectives of the paper include: 1. **Identify functional modules**: Through the ICA algorithm, quickly decompose functionally independent genomes from complex gene networks. These genes show similar functional impacts in specific cellular processes. 2. **Verify functional modules**: Select gene members related to hydroxycinnamic acid metabolism, stress resistance, acetyl - CoA assimilation, and nitrogen metabolism, and verify the effectiveness of these fModules by constructing engineered mutants. 3. **Compare gene regulation and function**: Compare the functional gene clusters obtained from the RB - TnSeq dataset with the regulatory gene clusters previously obtained through the RNAseq dataset ICA to reveal the relationship between gene regulation and function. Through this method, the researchers not only successfully reproduced previously known functional relationships but also established new associations between genes, providing a powerful tool for metabolic engineering and the study of functional relationships in other microorganisms. In addition, this method can also be used to guide the metabolic engineering efforts of other microorganisms or to understand the functional relationships of virulence - related genes in pathogenic microorganisms.