Tandem mass spectrum prediction for small molecules using graph transformers

Adamo Young,Hannes Röst,Bo Wang
DOI: https://doi.org/10.1038/s42256-024-00816-8
IF: 23.8
2024-04-06
Nature Machine Intelligence
Abstract:Tandem mass spectra capture fragmentation patterns that provide key structural information about molecules. Although mass spectrometry is applied in many areas, the vast majority of small molecules lack experimental reference spectra. For over 70 years, spectrum prediction has remained a key challenge in the field. Existing deep learning methods do not leverage global structure in the molecule, potentially resulting in difficulties when generalizing to new data. In this work we propose the MassFormer model for accurately predicting tandem mass spectra. MassFormer uses a graph transformer architecture to model long-distance relationships between atoms in the molecule. The transformer module is initialized with parameters obtained through a chemical pretraining task, then fine-tuned on spectral data. MassFormer outperforms competing approaches for spectrum prediction on multiple datasets and accurately models the effects of collision energy. Gradient-based attribution methods reveal that MassFormer can identify compositional relationships between peaks in the spectrum. When applied to spectrum identification problems, MassFormer generally surpasses the performance of existing prediction-based methods.
computer science, artificial intelligence, interdisciplinary applications
What problem does this paper attempt to address?