The IDSM mass spectrometry extension: searching mass spectra using SPARQL

Jakub Galgonek,Jiří Vondrášek
DOI: https://doi.org/10.1093/bioinformatics/btae174
IF: 5.8
2024-04-01
Bioinformatics
Abstract:Abstract Summary The Integrated Database of Small Molecules (IDSM) integrates data from small-molecule datasets, making them accessible through the SPARQL query language. Its unique feature is the ability to search for compounds through SPARQL based on their molecular structure. We extended IDSM to enable mass spectra databases to be integrated and searched for based on mass spectrum similarity. As sources of mass spectra, we employed the MassBank of North America (MoNA) database and the In Silico Spectral Database (ISDB) of natural products. Availability and implementation The extension is an integral part of IDSM, which is available at https://idsm.elixir-czech.cz. The manual and usage examples are available at https://idsm.elixir-czech.cz/docs/ms. The source codes of all IDSM parts are available under open-source licences at https://github.com/idsm-src.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
What problem does this paper attempt to address?
The problem this paper attempts to address is improving the accessibility and interoperability of small molecule mass spectrometry data. Specifically, the paper introduces an extension to the Integrated Database of Small Molecules (IDSM) that allows users to search the mass spectrometry database using the SPARQL query language and perform retrieval based on mass spectrometry similarity. This extension not only integrates mass spectrometry data from different sources (such as North America's MassBank and In Silico Spectral Database) but also utilizes existing ontologies (such as SIO and PSI-MS) to ensure seamless integration with other semantic databases. Additionally, the paper describes how matching algorithms are integrated into the SPARQL engine to facilitate efficient mass spectrometry similarity searches for users. Through this extension, researchers can more easily query and link related data across multiple databases, thereby enhancing the overall usability of the data and research efficiency.