Abstract:We show that deep learning models, and especially architectures like the Transformer, originally intended for natural language, can be trained on randomly generated datasets to predict to very high accuracy both the qualitative and quantitative features of metabolic networks. Using standard mathematical techniques, we create large sets (40 million elements) of random networks that can be used to train our models. These trained models can predict network equilibrium on random graphs in more than 99% of cases. They can also generalize to graphs with different structure than those encountered at training. Finally, they can predict almost perfectly the equilibria of a small set of known biological networks. Our approach is both very economical in experimental data and uses only small and shallow deep-learning model, far from the large architectures commonly used in machine translation. Such results pave the way for larger use of deep learning models for problems related to biological networks in key areas such as quantitative systems pharmacology, systems biology, and synthetic biology.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to use deep - learning models, especially the Transformer architecture, to predict the equilibrium state of metabolic networks. Specifically, the paper focuses on two main problems: 1. **Given a general metabolic network, does an equilibrium state exist?** This problem is qualitative and involves determining whether a given metabolic network can reach a stable state. According to the network topology, if all nodes connected to the input nodes are also connected to the output nodes, then an equilibrium state exists. 2. **If an equilibrium state exists, can the metabolite concentration of each node be calculated?** This problem is quantitative. Assuming that the equilibrium state exists and is unique, it is necessary to calculate the metabolite concentration of each node in the equilibrium state. For networks with linear kinetics, this problem can be solved by matrix inversion, that is, \( \mathbf{J}^{-1}(\mathbf{f}) \mathbf{\phi} \), where \( \mathbf{J}(\mathbf{f}) \) is the Jacobian matrix of the network flux \( \mathbf{f} \), and \( \mathbf{\phi} \) is the input vector. ### Solution To train the model, the authors generated a large number of random metabolic network datasets and used these datasets to train the deep - learning model. The specific steps are as follows: 1. **Data generation**: - Use the Erdős - Rényi model to generate random graphs, select a certain number of nodes and edges, and assign random weights to the edges. - Add randomly connected input and output nodes. - Use an algorithm to determine whether the network has an equilibrium state and its metabolite concentration. 2. **Model training**: - Use the Transformer model for training. The Transformer is a deep - learning architecture originally used for natural language processing. - The input is the symbolic representation of the graph, and the output is the existence or non - existence of the equilibrium state (qualitative problem) or the metabolite concentration of each node (quantitative problem). 3. **Model evaluation**: - Evaluate the accuracy of the model on an unseen test set. - Test the generalization ability of the model on graphs with different distributions, including different numbers of nodes, different edge densities, and different graph models (such as small - world and scale - free networks). ### Main findings - **High accuracy**: The model has achieved an accuracy of over 99% on qualitative problems, and can also achieve very high accuracy in most cases on quantitative problems. - **Generalization ability**: The model not only performs well on graphs with the same distribution as the training set, but also can maintain high accuracy on graphs with different distributions. - **Practical application**: The model also performs very well on real - life biological metabolic networks and can accurately predict the equilibrium state and metabolite concentration in most cases. ### Significance This study shows that deep - learning models, especially the Transformer architecture, can be effectively applied to complex problems in biology, such as predicting the equilibrium state of metabolic networks. This paves the way for a wider application of deep - learning techniques in fields such as systems pharmacology, systems biology, and synthetic biology in the future.

A deep language model to predict metabolic network equilibria

Predicting Drug-target Binding Affinity Based on Graph Isomorphism Network and iTransformer

A deep learning architecture for metabolic pathway prediction

Leveraging large language models for metabolic engineering design

DeepMetabolism: A Deep Learning System to Predict Phenotype from Genome Sequencing

Transformer-based deep learning for predicting protein properties in the life sciences

BioStructNet: Structure-Based Network with Transfer Learning for Predicting Biocatalyst Functions

Automated Extraction and Visualization of Metabolic Networks from Biomedical Literature Using a Large Language Model

ProtTrans: Towards Cracking the Language of Life's Code Through Self-Supervised Deep Learning and High Performance Computing

Language model-guided anticipation and discovery of unknown metabolites

Evaluation of network architecture and data augmentation methods for deep learning in chemogenomics

Deep Learning Basedkcatprediction Enables Improved Enzyme Constrained Model Reconstruction

Predicting microbial genome-scale metabolic networks directly from 16S rRNA gene sequences

Predicting equilibrium distributions for molecular systems with deep learning

Transformer Performance for Chemical Reactions: Analysis of Different Predictive and Evaluation Scenarios

Towards Predicting Equilibrium Distributions for Molecular Systems with Deep Learning

Transformers and Large Language Models for Chemistry and Drug Discovery

BioNet: a large-scale and heterogeneous biological network model for interaction prediction with graph convolution

3D Deep Learning for Biological Function Prediction from Physical Fields

Deep learning allows genome-scale prediction of Michaelis constants from structural features

Deep learning systems as complex networks