NetAurHPD: Network Auralization Hyperlink Prediction Model to Identify Metabolic Pathways from Metabolomics Data

Tamir Bar-Tov,Rami Puzis,David Toubiana
2024-10-29
Abstract:Metabolite biosynthesis is regulated via metabolic pathways, which can be activated and deactivated within organisms. Understanding and identifying an organism's metabolic pathway network is a crucial aspect for various research fields, including crop and life stock breeding, pharmacology, and medicine. The problem of identifying whether a pathway is part of a studied metabolic system is commonly framed as a hyperlink prediction problem. The most important challenge in prediction of metabolic pathways is the sparsity of the labeled data. This challenge can partially be mitigated using metabolite correlation networks which are affected by all active pathways including those that were not confirmed yet in laboratory experiments. Unfortunately, extracting properties that can confirm or refute existence of a metabolic pathway in a particular organism is not a trivial task. In this research, we introduce the Network Auralization Hyperlink Prediction (NetAurHPD) which is a framework that relies on (1) graph auralization to extract and aggregate representations of nodes in metabolite correlation networks and (2) data augmentation method that generates metabolite correlation networks given a subset of chemical reactions defined as hyperlinks. Experiments with metabolites correlation-based networks of tomato pericarp demonstrate promising results for NetAurHPD, compared to alternative methods. Furthermore, the application of data augmentation improved NetAurHPD's learning capabilities and overall performance. Additionally, NetAurHPD outperformed state-of-the-art method in experiments under challenging conditions, and has the potential to be a valuable tool for exploring organisms with limited existing knowledge.
Molecular Networks
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to identify the metabolic pathways of organisms from metabolomics data. Specifically, the authors propose a new method - NetAurHPD (Network Auralization Hyperlink Prediction Model) to predict the existence or non - existence of metabolic pathways. ### Problem Background The biosynthesis of metabolites is regulated by metabolic pathways, which can be activated or inactivated within an organism. Understanding and identifying the metabolic pathway network of an organism is crucial for multiple research fields, including crop and livestock breeding, pharmacology, and medicine. However, identifying whether a pathway belongs to a specific metabolic system is usually framed as a hyperlink prediction problem, and the main challenge lies in the sparsity of labeled data. ### Research Objectives To address the above challenges, the authors propose the NetAurHPD model, aiming to: 1. **Utilize Metabolite Correlation Networks (Metabolite CNs)**: Extract and aggregate node representations to capture the influence of all active pathways, including those not yet confirmed in laboratory experiments. 2. **Introduce a data augmentation method**: Generate metabolite - related networks, defined as hyperlinks based on a subset of known chemical reactions, to alleviate the data sparsity problem and improve the model's learning ability and overall performance. ### Method Overview The core idea of NetAurHPD is to combine Graph Auralization and Deep Learning (DL) methods: - **Graph Auralization**: Convert the nodes in the metabolite - related network into sound wave representations, and propagate energy through the network until the energy of the entire network is evenly distributed. - **M5 Architecture**: Use the deep convolutional neural network M5 to learn each signal input, and propagate the signal through the sliding - window technique, and finally produce a single value as the prediction result. - **Loss Function**: Adopt Binary Cross Entropy (BCE) as the loss function to measure the difference between the predicted value and the actual category. - **Threshold Optimization**: Use the Youden index to determine the optimal threshold to better divide the prediction categories. ### Data Augmentation To further improve the model performance, the authors also propose a data augmentation method to generate more metabolite - related network samples by simulating the metabolic activities within an organism. The specific steps include: - Mark the substrate and product metabolites in each pathway. - Create different "species" subgroups, representing organisms with different biological activities due to environmental conditions or genetic manipulations. - Simulate the execution of a series of metabolic pathways, calculate the Pearson correlation coefficient between each pair of metabolites, and generate an enhanced metabolite - related network. ### Experimental Results The authors conducted experiments on the metabolite - related network of tomato peel, and the results showed that NetAurHPD performed well on datasets from three different years, with micro - average AUCs of 0.87, 0.857, and 0.884 respectively, and also reached a relatively high level of accuracy. In addition, the data augmentation method significantly improved the model's learning ability and robustness. In conclusion, NetAurHPD provides a promising new tool for identifying metabolic pathways from metabolomics data, especially for organisms with limited existing knowledge.