De Novo Molecular Structure Generation from Mass Spectra

Yanmin Liu,Xuan Zhang,Wei Zhao,Daming Zhu,Xuefeng Cui
DOI: https://doi.org/10.1109/bibm58861.2023.10385903
2023-01-01
Abstract:Mass spectrometry is a key technology for the identification of small molecules. However, traditional methods that rely on database comparisons have difficulty with newly discovered molecules that are not in the database. Recent advances in deep learning allow for direct analysis of mass spectra, which makes it possible to predict chemical structures without using a database. We have found that the accurate prediction of hydrogen atoms is a major challenge for the prediction of chemical structures, especially since they are not explicitly represented in SMILES. To address this challenge, we introduce MS2SMILES, a novel approach that treats hydrogen atoms as implicitly linked to heavy atoms. This method enables the model to predict both heavy atoms and hydrogen atoms accurately (instead of just focusing on heavy atoms) during the training phase. Additionally, MS2SMILES incorporates the SMILES grammatical rules when predicting chemical structures, increasing the reliability of the generated SMILES representations. We tested MS2SMILES using the GNPS and CASMI 2016 datasets, and it achieved SMILES prediction accuracies of 53.6% and 63.8%, respectively. These results demonstrate a significant improvement of 19.9% and 10.9% compared to the current leading method.
What problem does this paper attempt to address?