Abstract:Background: Enzymatic reaction networks are crucial to explore the mechanistic function of metabolites and proteins in biological systems and understanding the etiology of diseases and potential target for drug discovery. The increasing number of metabolic reactions allows the development of deep learning-based methods to discover new enzymatic reactions, which will expand the landscape of existing enzymatic reaction networks to investigate the disrupted metabolisms in diseases. Results: In this study, we propose the MPI-VGAE framework to predict metabolite-protein interactions (MPI) in a genome-scale heterogeneous enzymatic reaction network across ten organisms with thousands of enzymatic reactions. We improved the Variational Graph Autoencoders (VGAE) model to incorporate both molecular features of metabolites and proteins as well as neighboring features to achieve the best predictive performance of MPI. The MPI-VGAE framework showed robust performance in the reconstruction of hundreds of metabolic pathways and five functional enzymatic reaction networks. The MPI-VGAE framework was also applied to a homogenous metabolic reaction network and achieved as high performance as other state-of-art methods. Furthermore, the MPI-VGAE framework could be implemented to reconstruct the disease-specific MPI network based on hundreds of disrupted metabolites and proteins in Alzheimer's disease and colorectal cancer, respectively. A substantial number of new potential enzymatic reactions were predicted and validated by molecular docking. These results highlight the potential of the MPI-VGAE framework for the discovery of novel disease-related enzymatic reactions and drug targets in real-world applications. Data availability and implementation: The MPI-VGAE framework and datasets are publicly accessible on GitHub https://github.com/mmetalab/mpi-vgae . Author biographies: Cheng Wang received his Ph.D. in Chemistry from The Ohio State Univesity, USA. He is currently a Assistant Professor in School of Public Health at Shandong University, China. His research interests include bioinformatics, machine learning-based approach with applications to biomedical networks. Chuang Yuan is a research assistant at Shandong University. He obtained the MS degree in Biology at the University of Science and Technology of China. His research interests include biochemistry & molecular biology, cell biology, biomedicine, bioinformatics, and computational biology. Yahui Wang is a PhD student in Department of Chemistry at Washington University in St. Louis. Her research interests include biochemistry, mass spectrometry-based metabolomics, and cancer metabolism. Ranran Chen is a master graduate student in School of Public Health at University of Shandong, China. Yuying Shi is a master graduate student in School of Public Health at University of Shandong, China. Gary J. Patti is the Michael and Tana Powell Professor at Washington University in St. Louis, where he holds appointments in the Department of Chemisrty and the Department of Medicine. He is also the Senior Director of the Center for Metabolomics and Isotope Tracing at Washington University. His research interests include metabolomics, bioinformatics, high-throughput mass spectrometry, environmental health, cancer, and aging. Leyi Wei received his Ph.D. in Computer Science from Xiamen University, China. He is currently a Professor in School of Software at Shandong University, China. His research interests include machine learning and its applications to bioinformatics. Qingzhen Hou received his Ph.D. in the Centre for Integrative Bioinformatics VU (IBIVU) from Vrije Universiteit Amsterdam, the Netherlands. Since 2020, He has serveved as the head of Bioinformatics Center in National Institute of Health Data Science of China and Assistant Professor in School of Public Health, Shandong University, China. His areas of research are bioinformatics and computational biophysics. Key points: Genome-scale heterogeneous networks of metabolite-protein interaction (MPI) based on thousands of enzymatic reactions across ten organisms were constructed semi-automatically.An enzymatic reaction prediction method called Metabolite-Protein Interaction Variational Graph Autoencoders (MPI-VGAE) was developed and optimized to achieve higher performance compared with existing machine learning methods by using both molecular features of metabolites and proteins.MPI-VGAE is broadly useful for applications involving the reconstruction of metabolic pathways, functional enzymatic reaction networks, and homogenous networks (e.g., metabolic reaction networks).By implementing MPI-VGAE to Alzheimer's disease and colorectal cancer, we obtained several novel disease-related protein-metabolite reactions with biological meanings. Moreover, we further investigated the reasonable binding details of protein-metabolite interactions using molecular docking approaches which provided useful information for disease mechanism and drug design.

Learning graph representations of biochemical networks and its application to enzymatic link prediction

An Image-enhanced Molecular Graph Representation Learning Framework

Predicting biomedical relationships using the knowledge and graph embedding cascade model

Exploration of bioinformatic domain based on data mining, reaction and enzyme promiscuity predictions

BioPathNet: Enhancing Link Prediction in Biomedical Knowledge Graphs through Path Representation Learning

Gene-Metabolite Association Prediction with Interactive Knowledge Transfer Enhanced Graph for Metabolite Production

Graph-based prediction of Protein-protein interactions with attributed signed graph embedding

Path-based reasoning for biomedical knowledge graphs with BioPathNet

Pre-training graph neural networks for link prediction in biomedical networks

Active learning maps the emergent dynamics of enzymatic reaction networks.

Accurately predicting enzyme functions through geometric graph learning on ESMFold-predicted structures

A knowledge graph representation learning approach to predict novel kinase-substrate interactions

Predicting Drug-Target Interactions with Deep-Embedding Learning of Graphs and Sequences.

Path-based reasoning in biomedical knowledge graphs

Graph representation learning in bioinformatics: trends, methods and applications

Genome-scale enzymatic reaction prediction by variational graph autoencoders

Fast and scalable learning of neuro-symbolic representations of biomedical knowledge

FuseLinker: Leveraging LLM's pre-trained text embeddings and domain knowledge to enhance GNN-based link prediction on biomedical knowledge graphs

A Complex Network based Graph Embedding Method for Link Prediction

TopEC: Improved classification of enzyme function by a localized 3D protein descriptor and 3D Graph Neural Networks

ERL‐ProLiGraph: Enhanced representation learning on protein‐ligand graph structured data for binding affinity prediction