Machine learning identifies key metabolic reactions in bacterial growth on different carbon sources

Hyunjae Woo,Youngshin Kim,Dohyeon Kim,Sung Ho Yoon
DOI: https://doi.org/10.1038/s44320-024-00017-w
IF: 13.068
2024-01-30
Molecular Systems Biology
Abstract:Abstract Carbon source-dependent control of bacterial growth is fundamental to bacterial physiology and survival. However, pinpointing the metabolic steps important for cell growth is challenging due to the complexity of cellular networks. Here, the elastic net model and multilayer perception model that integrated genome-wide gene-deletion data and simulated flux distributions were constructed to identify metabolic reactions beneficial or detrimental to Escherichia coli grown on 30 different carbon sources. Both models outperformed traditional in silico methods by identifying not just essential reactions but also nonessential ones that promote growth. They successfully predicted metabolic reactions beneficial to cell growth, with high convergence between the models. The models revealed that biosynthetic pathways generally promote growth across various carbon sources, whereas the impact of energy-generating pathways varies with the carbon source. Intriguing predictions were experimentally validated for findings beyond experimental training data and the impact of various carbon sources on the glyoxylate shunt, pyruvate dehydrogenase reaction, and redundant purine biosynthesis reactions. These highlight the practical significance and predictive power of the models for understanding and engineering microbial metabolism.
biochemistry & molecular biology
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to identify key metabolic reactions that are beneficial or harmful to the growth of Escherichia coli under different carbon source conditions using machine learning methods. Specifically, the researchers constructed two machine learning models—Elastic Net Regression (EN) and Multilayer Perceptron (MLP)—to integrate whole-genome gene knockout data and simulated flux distributions. These models can identify not only the metabolic reactions that are crucial for cell growth but also those that, while not essential, can promote growth. ### Main Findings 1. **Model Performance Superior to Traditional Methods**: Compared to traditional computational methods, the EN and MLP models performed better in identifying metabolic reactions beneficial to cell growth. 2. **High Consistency in Predictions**: Both models showed high consistency in predicting metabolic reactions beneficial to cell growth. 3. **Experimental Validation**: The researchers experimentally validated the model predictions, including some findings beyond the training data. 4. **Metabolic Pathway Analysis**: Synthetic metabolic pathways are generally beneficial for cell growth under various carbon source conditions, while the impact of energy generation pathways varies with different carbon sources. ### Experimental Design - **Data Preparation**: The researchers collected growth data of an E. coli gene knockout library under 30 different carbon source conditions and generated input data through Minimization of Metabolic Adjustment (MOMA) simulations. - **Model Construction**: EN and MLP models were constructed, and the results of the MLP model were interpreted using the SHAP method. - **Model Evaluation**: The accuracy of the models was evaluated by comparing them with experimental data, and additional independent gene knockout experimental data were used for validation. - **Experimental Validation**: Growth experiments were conducted to validate the impact of the predicted metabolic reactions on cell growth. These findings indicate that combining experimental data with flux predictions can effectively capture the complexity and non-linear characteristics of biological systems, providing powerful tools for understanding and engineering microbial metabolism.