A deep language model to predict metabolic network equilibria

François Charton,Amaury Hayat,Sean T. McQuade,Nathaniel J. Merrill,Benedetto Piccoli
DOI: https://doi.org/10.48550/arXiv.2112.03588
2021-12-07
Abstract:We show that deep learning models, and especially architectures like the Transformer, originally intended for natural language, can be trained on randomly generated datasets to predict to very high accuracy both the qualitative and quantitative features of metabolic networks. Using standard mathematical techniques, we create large sets (40 million elements) of random networks that can be used to train our models. These trained models can predict network equilibrium on random graphs in more than 99% of cases. They can also generalize to graphs with different structure than those encountered at training. Finally, they can predict almost perfectly the equilibria of a small set of known biological networks. Our approach is both very economical in experimental data and uses only small and shallow deep-learning model, far from the large architectures commonly used in machine translation. Such results pave the way for larger use of deep learning models for problems related to biological networks in key areas such as quantitative systems pharmacology, systems biology, and synthetic biology.
Machine Learning,Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to use deep - learning models, especially the Transformer architecture, to predict the equilibrium state of metabolic networks. Specifically, the paper focuses on two main problems: 1. **Given a general metabolic network, does an equilibrium state exist?** This problem is qualitative and involves determining whether a given metabolic network can reach a stable state. According to the network topology, if all nodes connected to the input nodes are also connected to the output nodes, then an equilibrium state exists. 2. **If an equilibrium state exists, can the metabolite concentration of each node be calculated?** This problem is quantitative. Assuming that the equilibrium state exists and is unique, it is necessary to calculate the metabolite concentration of each node in the equilibrium state. For networks with linear kinetics, this problem can be solved by matrix inversion, that is, \( \mathbf{J}^{-1}(\mathbf{f}) \mathbf{\phi} \), where \( \mathbf{J}(\mathbf{f}) \) is the Jacobian matrix of the network flux \( \mathbf{f} \), and \( \mathbf{\phi} \) is the input vector. ### Solution To train the model, the authors generated a large number of random metabolic network datasets and used these datasets to train the deep - learning model. The specific steps are as follows: 1. **Data generation**: - Use the Erdős - Rényi model to generate random graphs, select a certain number of nodes and edges, and assign random weights to the edges. - Add randomly connected input and output nodes. - Use an algorithm to determine whether the network has an equilibrium state and its metabolite concentration. 2. **Model training**: - Use the Transformer model for training. The Transformer is a deep - learning architecture originally used for natural language processing. - The input is the symbolic representation of the graph, and the output is the existence or non - existence of the equilibrium state (qualitative problem) or the metabolite concentration of each node (quantitative problem). 3. **Model evaluation**: - Evaluate the accuracy of the model on an unseen test set. - Test the generalization ability of the model on graphs with different distributions, including different numbers of nodes, different edge densities, and different graph models (such as small - world and scale - free networks). ### Main findings - **High accuracy**: The model has achieved an accuracy of over 99% on qualitative problems, and can also achieve very high accuracy in most cases on quantitative problems. - **Generalization ability**: The model not only performs well on graphs with the same distribution as the training set, but also can maintain high accuracy on graphs with different distributions. - **Practical application**: The model also performs very well on real - life biological metabolic networks and can accurately predict the equilibrium state and metabolite concentration in most cases. ### Significance This study shows that deep - learning models, especially the Transformer architecture, can be effectively applied to complex problems in biology, such as predicting the equilibrium state of metabolic networks. This paves the way for a wider application of deep - learning techniques in fields such as systems pharmacology, systems biology, and synthetic biology in the future.