LipiDetective - a deep learning model for the identification of molecular lipid species in tandem mass spectra

Vivian Wuerf,Nikolai Koehler,Florian Molnar,Lisa Hahnefeld,Robert Gurke,Michael Witting,Josch K Pauling
DOI: https://doi.org/10.1101/2024.10.07.617094
2024-10-11
Abstract:Lipids are involved in many vital processes within the cell, and alterations in lipid homeostasis have been associated with various diseases such as cancer or type 2 diabetes. Confidently identifying lipids in samples is a prerequisite for understanding the multiple functions lipids fulfill in health and disease. However, the accurate identification of molecular lipid species based on tandem mass spectrometry data is still a key challenge in lipidomics. Most current approaches rely on using a custom pipeline to process and match the measured spectra against an in-house spectra reference library, which hinders the comparability of results. To address this challenge, a transformer model called LipiDetective was developed and trained on a dataset composed of reference spectra measured from lipid standards, spectra from databases, and published experiments, utilizing both shotgun as well as liquid-chromatography mass spectrometry. LipiDetective demonstrates, for the first time, that artificial neural networks can learn the characteristic lipid fragmentation patterns to automatically and accurately annotate molecular lipids species in tandem mass spectra independently of the experimental setup. The model can even correctly predict lipid species for which it has never seen a spectrum before as it is able to generalize the learned lipid fragmentation patterns. Analysis of the integrated gradients reveals that LipiDetective focuses on relevant peaks that can be matched to known fragments and are thus humanly interpretable. Therefore, LipiDetective has the potential to be a valuable tool to aid in the lipid identification process and support the comparability of results from different sources. Aside from Lipidetective as a "ready-to-use" application, this work primarily offers a deeper understanding of how the model functions and how future deep learning models for lipid identification in mass spectra could be improved.
Bioinformatics
What problem does this paper attempt to address?