Abstract:Untargeted metabolomics based on liquid chromatography-mass spectrometry technology is quickly gaining widespread application, given its ability to depict the global metabolic pattern in biological samples. However, the data are noisy and plagued by the lack of clear identity of data features measured from samples. Multiple potential matchings exist between data features and known metabolites, while the truth can only be one-to-one matches. Some existing methods attempt to reduce the matching uncertainty, but are far from being able to remove the uncertainty for most features. The existence of the uncertainty causes major difficulty in downstream functional analysis. To address these issues, we develop a novel approach for Bayesian Analysis of Untargeted Metabolomics data (BAUM) to integrate previously separate tasks into a single framework, including matching uncertainty inference, metabolite selection and functional analysis. By incorporating the knowledge graph between variables and using relatively simple assumptions, BAUM can analyze datasets with small sample sizes. By allowing different confidence levels of feature-metabolite matching, the method is applicable to datasets in which feature identities are partially known. Simulation studies demonstrate that, compared with other existing methods, BAUM achieves better accuracy in selecting important metabolites that tend to be functionally consistent and assigning confidence scores to feature-metabolite matches. We analyze a COVID-19 metabolomics dataset and a mouse brain metabolomics dataset using BAUM. Even with a very small sample size of 16 mice per group, BAUM is robust and stable. It finds pathways that conform to existing knowledge, as well as novel pathways that are biologically plausible. Untargeted metabolomics based on liquid chromatographymass spectrometry technology depicts the global metabolic pattern in biological samples. However, multiple potential matchings exist between data features and known metabolites, while the truth can only be one-to-one matches, which causes major difficulty in downstream functional analysis. To address these issues, we develop a novel approach for Bayesian Analysis of Untargeted Metabolomics data (BAUM) to integrate previously separate tasks into a single framework, including matching uncertainty inference, metabolite selection, and functional analysis. The figure illustrates the overall setup of BAUM. The observed data for model include the feature-level summary statistics | | computed from the observed metabolic feature and the clinical outcome (optional), the potential feature-metabolite matches and their confidence measures | |⁠ , and the known metabolic network structure. The output of the model include the false discovery rate (FDR) for each metabolite, and the strength of each feature-metabolite matching. We use a Bayesian latent factor model to characterize the observed feature summary statistics and link them to the unobserved metabolite behavior. We assign a Multinomial prior with prior probabilities | | to matching indicators | |⁠ , a normal prior to the null component score | |⁠ , a Dirichlet Process prior to the alternative component score | | and a weighted Potts prior to metabolite latent class indicators | |⁠ . Generally, the observed summary statistic of a feature is a linear combination of the unobserved scores of its linked metabolites. The weights reflect the confidence level of the metabolite-feature annotation, and are to be estimated from the data. The metabolites are segregated into two latent classes: the clinically relevant class (alternative component), and the clinically irrelevant class (null component). The two classes have different distributions of metabolite scores. Metabolites that are connected on the metabolic network are more likely to belong to the same class

Unified Bayesian representation for high-dimensional multi-modal biomedical data for small-sample classification

Unified Analysis of Multi-order Tensors for Integrative Molecular Profiling

Sparse Bayesian Multiview Learning for Simultaneous Association Discovery and Diagnosis of Alzheimer's Disease.

DIVERSE: Bayesian Data IntegratiVE learning for precise drug ResponSE prediction

Handling Ill-Conditioned Omics Data With Deep Probabilistic Models

Bayesian sparse graphical models for classification with application to protein expression data

Bayesian functional analysis for untargeted metabolomics data with matching uncertainty and small sample sizes

Multi-Channel Stochastic Variational Inference for the Joint Analysis of Heterogeneous Biomedical Data in Alzheimer's Disease

Functional Integrative Bayesian Analysis of High-dimensional Multiplatform Genomic Data

A Fast Algorithm for Bayesian Multi-Locus Model in Genome-Wide Association Studies

Bayesian Conditional Gaussian Network Classifiers with Applications to Mass Spectra Classification

Scalable Bayesian regression in high dimensions with multiple data sources

A Bayesian latent class extension of naive Bayesian classifier and its application to the classification of gastric cancer patients

Sparse Bayesian Learning for Identifying Imaging Biomarkers in AD Prediction.

Detecting Disease-Associated Genomic Outcomes Using Constrained Mixture of Bayesian Hierarchical Models for Paired Data.

Multinomial belief networks for healthcare data

Heterogeneous multimodal biomarkers analysis for Alzheimer's disease via Bayesian network

Unsupervised Bayesian classification for models with scalar and functional covariates

Seeing Unseen: Discover Novel Biomedical Concepts via Geometry-Constrained Probabilistic Modeling

High-dimensional Feature Selection Using Hierarchical Bayesian Logistic Regression with Heavy-tailed Priors

Bayesian Joint Additive Factor Models for Multiview Learning