A guide to reverse metabolomics – a framework for big data discovery strategy.
Vincent Charron-Lamoureux,Helena Mannochio-Russo,Santosh Lamichhane,Shipei Xing,Abubaker Patan,Paulo Wender Portal Gomes,Prajit Rajkumar,Victoria Deleray,Andrés Mauricio Caraballo-Rodriguez,Kee Voon Chua,Lye Siang Lee,Zhao Liu,Jianhong Ching,Mingxun Wang,Pieter C. Dorrestein
DOI: https://doi.org/10.26434/chemrxiv-2024-4cb43
2024-12-19
Abstract:Untargeted metabolomics is evolving into a field of big data science. There is a growing interest within the metabolomics community in mining MS/MS-based data from public repositories. The theme of this protocol, reverse metabolomics, is a data science strategy that differs from the traditional LC-MS/MS-based untargeted metabolomics approach. In traditional untargeted metabolomics, we first collect the samples to address a predefined question and then collect LC-MS/MS data. We then identify metabolites associated with a phenotype (e.g., disease vs. healthy), and elucidate or validate their structural details (e.g., molecular formula, structural classification, substructure, or complete structural annotation or identification). Reverse metabolomics, however, does not necessarily involve collecting new data or requiring the structural characterization of molecules. Instead, we start with MS/MS spectra for known or unknown molecules and discover phenotype-relevant information such as organ/biofluid distribution, disease condition, intervention status (e.g., pre- and post-intervention), organisms (e.g., mammals vs. others), geography, and any other biologically relevant associations available in public repositories. This protocol guides the reader through the step-by-step process of utilizing available MS/MS data and discovering repository-scale associations of the associated MS/MS spectra. As example, we utilize MS/MS spectra from three small molecules: phenylalanine-cholic acid (a microbially conjugated bile acid), phenylalanine-C4:0, and histidine-C4:0 (two N-acyl amides). We leverage the GNPS-based framework to explore the microbial producers of these molecules and their associations with health conditions and organ distributions in humans and rodents.
Chemistry