Learning graph representations of biochemical networks and its application to enzymatic link prediction

Julie Jiang,Li-Ping Liu,Soha Hassoun
DOI: https://doi.org/10.1093/bioinformatics/btaa881
IF: 5.8
2020-10-14
Bioinformatics
Abstract:Abstract Motivation The complete characterization of enzymatic activities between molecules remains incomplete, hindering biological engineering and limiting biological discovery. We develop in this work a technique, enzymatic link prediction (ELP), for predicting the likelihood of an enzymatic transformation between two molecules. ELP models enzymatic reactions cataloged in the KEGG database as a graph. ELP is innovative over prior works in using graph embedding to learn molecular representations that capture not only molecular and enzymatic attributes but also graph connectivity. Results We explore transductive (test nodes included in the training graph) and inductive (test nodes not part of the training graph) learning models. We show that ELP achieves high AUC when learning node embeddings using both graph connectivity and node attributes. Further, we show that graph embedding improves link prediction by 30% in area under curve over fingerprint-based similarity approaches and by 8% over support vector machines. We compare ELP against rule-based methods. We also evaluate ELP for predicting links in pathway maps and for reconstruction of edges in reaction networks of four common gut microbiota phyla: actinobacteria, bacteroidetes, firmicutes and proteobacteria. To emphasize the importance of graph embedding in the context of biochemical networks, we illustrate how graph embedding can guide visualization. Availability and implementation The code and datasets are available through https://github.com/HassounLab/ELP.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to predict the possibility of enzymatic transformation between two molecules, namely the "link prediction" problem. Specifically, the authors developed a technique named Enzymatic Link Prediction (ELP) for predicting the possibility of an enzymatic reaction occurring between two molecules. This technique aims to learn molecular representations through graph embedding methods. These representations not only capture the properties of molecules and enzymes but also reflect the connectivity of the graph. This helps to overcome the problem of incomplete characterization of enzyme activities in existing databases, thereby promoting the development of bio - engineering and the expansion of biological discoveries. ### Main Contributions 1. **Graph Structure Mapping**: ELP maps known enzymatic reactions (from the KEGG database) into a graph structure, where compounds are nodes and reactions are edges. This method utilizes the connectivity of the entire graph to predict enzymatic links, rather than just part of the graph structure. 2. **Graph Embedding**: ELP uses graph embedding techniques to learn molecular representations that reflect not only the structural properties of molecules but also their relationships in the network graph. This embedding method performs well in predicting missing information, identifying false interactions, predicting new links in future networks, and analyzing biomedical networks. ### Method Overview - **Graph Construction**: Construct a graph from the KEGG database, with compounds as nodes and reactions as edges. Exclude high - connectivity co - factor nodes and their edges to focus on the connections between non - co - factor metabolites. - **Graph Embedding**: Use the Embedding Propagation (EP) algorithm to learn node embedding vectors. The EP algorithm propagates messages through an iterative process to learn the embedding vector of each node. - **Link Prediction**: Train a neural network, with the input being the embedding vectors of a pair of nodes and the output being the probability of a connection existing between these two molecules. The evaluation metric is the Area Under the Curve (AUC). ### Experimental Results - **Transductive Learning**: ELP outperforms other methods in the transductive learning scenario, especially when combining connectivity and fingerprint properties, with an AUC reaching 0.97. - **Inductive Learning**: ELP can also maintain high performance in the inductive learning scenario, with an AUC reaching 0.93 - 0.94. - **Path Reconstruction**: ELP also performs well in the metabolic path reconstruction task, with an average AUC of 0.88. - **Microbial Community Reconstruction**: In the enzymatic reaction reconstruction of four common gut microbial phyla (Actinobacteria, Bacteroidetes, Firmicutes, and Proteobacteria), ELP has an AUC reaching 0.89 - 0.91. - **Rule Reconstruction**: ELP also shows advantages in rule - based methods and can effectively recover edges related to the most common rules. ### Conclusion ELP effectively predicts the possibility of enzymatic transformation between two molecules through graph embedding techniques, providing new tools and opportunities for bio - engineering and biological discoveries.