IDSL.UFA assigns high confidence molecular formula annotations for untargeted LC/HRMS datasets in metabolomics and exposomics

Sadjad Fakouri Baygi,Sanjay K Banerjee,Praloy Chakraborty,Yashwant Kumar,Dinesh Kumar Barupal
DOI: https://doi.org/10.1101/2022.02.02.478834
2022-02-04
Abstract:Abstract Untargeted LC/HRMS assays in metabolomics and exposomics aims to characterize the small molecule chemical space in a biospecimen. To gain maximum biological insights from these datasets, LC/HRMS peaks should be annotated with chemical and functional information including molecular formula, structure, sub-structure, chemical class and metabolic pathways. Among these, a molecular formula can be assigned to a majority of LC/HRMS peaks using the theoretical and observed isotopic profiles (MS1) of the underlying ionized compound. For this, we have developed the Integrated Data Science Laboratory for Metabolomics and Exposomics – Ultimate Formula Annotation (IDSL.UFA) R package. In the untargeted metabolomics validation tests, IDSL.UFA assigned 54.31%-85.51% molecular formula for true positive annotations as the top hit, and 90.58%-100% within the top five hits. Molecular formula annotations were further validated by MS/MS data. We have implemented novel strategies to 1) generate formula sources and their theoretical isotopic profiles 2) optimize the formula hits ranking on the individual and the aligned peak lists and 3) scale IDSL.UFA based workflows for studies with larger sample sizes. Annotating the raw data for a publicly available pregnancy metabolome study using IDSL.UFA highlighted hundreds of new pregnancy related compounds, and also suggested presence of chlorinated perfluorotriether alcohols (Cl-PFTrEAs) in human specimens. IDSL.UFA is useful for human metabolomics and exposomics studies where we need to minimize the loss of biological insights in untargeted LC/HRMS datasets. The IDSL.UFA package is available in the R CRAN repository https://cran.r-project.org/package=IDSL.UFA . Detailed documentation and tutorials are also provided at www.ufa.idsl.me .
What problem does this paper attempt to address?