Bayesian functional analysis for untargeted metabolomics data with matching uncertainty and small sample sizes
Guoxuan Ma,Jian Kang,Tianwei Yu
DOI: https://doi.org/10.1093/bib/bbae141
IF: 9.5
2024-04-08
Briefings in Bioinformatics
Abstract:Untargeted metabolomics based on liquid chromatography-mass spectrometry technology is quickly gaining widespread application, given its ability to depict the global metabolic pattern in biological samples. However, the data are noisy and plagued by the lack of clear identity of data features measured from samples. Multiple potential matchings exist between data features and known metabolites, while the truth can only be one-to-one matches. Some existing methods attempt to reduce the matching uncertainty, but are far from being able to remove the uncertainty for most features. The existence of the uncertainty causes major difficulty in downstream functional analysis. To address these issues, we develop a novel approach for Bayesian Analysis of Untargeted Metabolomics data (BAUM) to integrate previously separate tasks into a single framework, including matching uncertainty inference, metabolite selection and functional analysis. By incorporating the knowledge graph between variables and using relatively simple assumptions, BAUM can analyze datasets with small sample sizes. By allowing different confidence levels of feature-metabolite matching, the method is applicable to datasets in which feature identities are partially known. Simulation studies demonstrate that, compared with other existing methods, BAUM achieves better accuracy in selecting important metabolites that tend to be functionally consistent and assigning confidence scores to feature-metabolite matches. We analyze a COVID-19 metabolomics dataset and a mouse brain metabolomics dataset using BAUM. Even with a very small sample size of 16 mice per group, BAUM is robust and stable. It finds pathways that conform to existing knowledge, as well as novel pathways that are biologically plausible. Untargeted metabolomics based on liquid chromatographymass spectrometry technology depicts the global metabolic pattern in biological samples. However, multiple potential matchings exist between data features and known metabolites, while the truth can only be one-to-one matches, which causes major difficulty in downstream functional analysis. To address these issues, we develop a novel approach for Bayesian Analysis of Untargeted Metabolomics data (BAUM) to integrate previously separate tasks into a single framework, including matching uncertainty inference, metabolite selection, and functional analysis. The figure illustrates the overall setup of BAUM. The observed data for model include the feature-level summary statistics | | computed from the observed metabolic feature and the clinical outcome (optional), the potential feature-metabolite matches and their confidence measures | | , and the known metabolic network structure. The output of the model include the false discovery rate (FDR) for each metabolite, and the strength of each feature-metabolite matching. We use a Bayesian latent factor model to characterize the observed feature summary statistics and link them to the unobserved metabolite behavior. We assign a Multinomial prior with prior probabilities | | to matching indicators | | , a normal prior to the null component score | | , a Dirichlet Process prior to the alternative component score | | and a weighted Potts prior to metabolite latent class indicators | | . Generally, the observed summary statistic of a feature is a linear combination of the unobserved scores of its linked metabolites. The weights reflect the confidence level of the metabolite-feature annotation, and are to be estimated from the data. The metabolites are segregated into two latent classes: the clinically relevant class (alternative component), and the clinically irrelevant class (null component). The two classes have different distributions of metabolite scores. Metabolites that are connected on the metabolic network are more likely to belong to the same class
biochemical research methods,mathematical & computational biology