Abstract:Untargeted metabolomics based on liquid chromatography-mass spectrometry technology is quickly gaining widespread application, given its ability to depict the global metabolic pattern in biological samples. However, the data are noisy and plagued by the lack of clear identity of data features measured from samples. Multiple potential matchings exist between data features and known metabolites, while the truth can only be one-to-one matches. Some existing methods attempt to reduce the matching uncertainty, but are far from being able to remove the uncertainty for most features. The existence of the uncertainty causes major difficulty in downstream functional analysis. To address these issues, we develop a novel approach for Bayesian Analysis of Untargeted Metabolomics data (BAUM) to integrate previously separate tasks into a single framework, including matching uncertainty inference, metabolite selection and functional analysis. By incorporating the knowledge graph between variables and using relatively simple assumptions, BAUM can analyze datasets with small sample sizes. By allowing different confidence levels of feature-metabolite matching, the method is applicable to datasets in which feature identities are partially known. Simulation studies demonstrate that, compared with other existing methods, BAUM achieves better accuracy in selecting important metabolites that tend to be functionally consistent and assigning confidence scores to feature-metabolite matches. We analyze a COVID-19 metabolomics dataset and a mouse brain metabolomics dataset using BAUM. Even with a very small sample size of 16 mice per group, BAUM is robust and stable. It finds pathways that conform to existing knowledge, as well as novel pathways that are biologically plausible. Untargeted metabolomics based on liquid chromatographymass spectrometry technology depicts the global metabolic pattern in biological samples. However, multiple potential matchings exist between data features and known metabolites, while the truth can only be one-to-one matches, which causes major difficulty in downstream functional analysis. To address these issues, we develop a novel approach for Bayesian Analysis of Untargeted Metabolomics data (BAUM) to integrate previously separate tasks into a single framework, including matching uncertainty inference, metabolite selection, and functional analysis. The figure illustrates the overall setup of BAUM. The observed data for model include the feature-level summary statistics | | computed from the observed metabolic feature and the clinical outcome (optional), the potential feature-metabolite matches and their confidence measures | |⁠ , and the known metabolic network structure. The output of the model include the false discovery rate (FDR) for each metabolite, and the strength of each feature-metabolite matching. We use a Bayesian latent factor model to characterize the observed feature summary statistics and link them to the unobserved metabolite behavior. We assign a Multinomial prior with prior probabilities | | to matching indicators | |⁠ , a normal prior to the null component score | |⁠ , a Dirichlet Process prior to the alternative component score | | and a weighted Potts prior to metabolite latent class indicators | |⁠ . Generally, the observed summary statistic of a feature is a linear combination of the unobserved scores of its linked metabolites. The weights reflect the confidence level of the metabolite-feature annotation, and are to be estimated from the data. The metabolites are segregated into two latent classes: the clinically relevant class (alternative component), and the clinically irrelevant class (null component). The two classes have different distributions of metabolite scores. Metabolites that are connected on the metabolic network are more likely to belong to the same class

Improving insights from metabolomic functional analysis combining multivariate tools

Integrative analysis of metabolic disorders by means of medical bioinformatics.

Computational Approaches for Systems Metabolomics

Quantitative Comparison of Statistical Methods for Analyzing Human Metabolomics Data

Characterization of cationic vector-based gene delivery vehicles using isothermal titration and differential scanning calorimetry.

[Epidermal nevus associated with a type I neurofibromatosis and a nephroblastoma: a new epidermal nevus syndrome?].

Statistical Methods for the Analysis of High-Throughput Metabolomics Data

Multivariate Methods for the Integration and Visualization of Omics Data

Network-based approach for analyzing intra- and interfluid metabolite associations in human blood, urine, and saliva.

Multivariate analysis of NMR‐based metabolomic data

Characterization of missing values in untargeted MS-based metabolomics data and evaluation of missing data handling strategies

Comparative Integrated Omics Approach Sterically Understanding Hepatic Metabolic Dynamics in Mouse Model

Exploring dynamic metabolomics data with multiway data analysis: a simulation study

Multivariate curve resolution- based data fusion approaches applied in 1 H NMR metabolomic analysis of healthy cohorts

An integrated deep learning framework for the interpretation of untargeted metabolomics data

Integrating untargeted metabolomics, genetically informed causal inference, and pathway enrichment to define the obesity metabolome

N w‐Propyl‐l‐arginine (L‐NPA) reduces status epilepticus and early epileptogenic events in a mouse model of epilepsy: behavioural, EEG and immunohistochemical analyses

INTEGRATE: Model-based multi-omics data integration to characterize multi-level metabolic regulation

Bayesian functional analysis for untargeted metabolomics data with matching uncertainty and small sample sizes

Benchmarking feature selection and feature extraction methods to improve the performances of machine-learning algorithms for patient classification using metabolomics biomedical data

Weighted variance component test for the integrative multi-omics analysis of microbiome data