Deriving Accurate Lipid Classification based on Molecular Formula

Joshua M. Mitchell,Hunter N.B. Moseley
DOI: https://doi.org/10.1101/572883
2019-03-11
Abstract:Abstract Introduction Although Fourier-transform mass spectrometry has substantially improved our ability to detect lipids and other metabolites; the untargeted and accurate assignment of detected metabolites remains an unsolved problem in metabolomics. New assignment methods such as our SMIRFE algorithm can assign elemental molecular formula to observed spectral features in an untargeted manner without orthogonal information from tandem MS or chromatography. However, for many lipidomics applications, it is necessary to know at least the lipid category or class that is associated with a detected spectral feature in order to derive biochemical interpretation. Objectives Our goal is to develop a method for robustly classifying elemental molecular formula assignments into lipid categories for application to SMIRFE-generated assignments. Results Using machine learning, we developed a method that can predict lipid category and class from SMIRFE molecular formula assignments. Our methods achieve high accuracy (>90%) and precision (>83%) for all eight of the lipid categories in the LIPIDMAPS database. Model performance was evaluated using sets of theoretical, data-derived, and artifactual molecular formulas. Our models were generalizable, applicable to real-world datasets, and very discriminating with most molecular formulas classified to the “not lipid” category. Lipid categories with the highest classification propensities were glycerophospholipids and sphingolipids, matching the highest category prevalence in LIPIDMAPS. Conclusions Our methods enable the lipid classification of untargeted molecular formula assignments generated by SMIRFE without orthogonal information, facilitating biochemical interpretation of highly untargeted lipidomics experiments. However, this lipid classification appears insufficient for validating single-spectrum assignments, but could be useful in cross-spectrum assignment validation.
What problem does this paper attempt to address?