IDSL_MINT: a deep learning framework to predict molecular fingerprints from mass spectra

Sadjad Fakouri Baygi,Dinesh Kumar Barupal
DOI: https://doi.org/10.1186/s13321-024-00804-5
2024-01-21
Journal of Cheminformatics
Abstract:The majority of tandem mass spectrometry (MS/MS) spectra in untargeted metabolomics and exposomics studies lack any annotation. Our deep learning framework, Integrated Data Science Laboratory for Metabolomics and Exposomics— M ass INT erpreter (IDSL_MINT) can translate MS/MS spectra into molecular fingerprint descriptors. IDSL_MINT allows users to leverage the power of the transformer model for mass spectrometry data, similar to the large language models. Models are trained on user-provided reference MS/MS libraries via any customizable molecular fingerprint descriptors. IDSL_MINT was benchmarked using the LipidMaps database and improved the annotation rate of a test study for MS/MS spectra that were not originally annotated using existing mass spectral libraries. IDSL_MINT may improve the overall annotation rates in untargeted metabolomics and exposomics studies. The IDSL_MINT framework and tutorials are available in the GitHub repository at https://github.com/idslme/IDSL_MINT.
chemistry, multidisciplinary,computer science, interdisciplinary applications, information systems
What problem does this paper attempt to address?
This paper aims to address the issue of the lack of annotation for the large amount of mass spectrometry data (MS/MS spectra) generated in untargeted analysis in metabolomics and exposomics studies. Specifically, the paper introduces a deep learning framework called IDSL_MINT, which can transform MS/MS spectra into molecular fingerprint descriptors to aid in compound annotation. IDSL_MINT leverages a transformer model, similar to large language models, and is trained using a reference MS/MS library provided by the user and custom molecular fingerprint descriptors. Through benchmarking on the LipidMaps database, IDSL_MINT has improved the overall annotation rate of MS/MS spectra that are not annotated by existing mass spectrometry libraries. Therefore, IDSL_MINT is expected to enhance annotation efficiency in metabolomics and exposomics research.