Predicting the Predicted: A Comparison of Machine Learning-Based Collision Cross-Section Prediction Models for Small Molecules

Sara M de Cripan,Trisha Arora,Adrià Olomí,Núria Canela,Gary Siuzdak,Xavier Domingo-Almenara
DOI: https://doi.org/10.1021/acs.analchem.4c00630
IF: 7.4
2024-05-24
Analytical Chemistry
Abstract:The application of machine learning (ML) to -omics research is growing at an exponential rate owing to the increasing availability of large amounts of data for model training. Specifically, in metabolomics, ML has enabled the prediction of tandem mass spectrometry and retention time data. More recently, due to the advent of ion mobility, new ML models have been introduced for collision cross-section (CCS) prediction, but those have been trained with different and relatively small data sets...
chemistry, analytical
What problem does this paper attempt to address?
The paper aims to address the accuracy issue of Collision Cross-Section (CCS) prediction in Ion Mobility-Mass Spectrometry (IM-MS). Specifically, the goals of the paper include: 1. **Comparison of Existing Machine Learning Models**: - The paper compares four existing machine learning-based CCS prediction models (CCSP2.0, CCSBase, AllCCS, and DeepCCS) and evaluates their performance on the newly released METLIN-CCS dataset. 2. **Evaluation of Model Generalization Ability**: - The study assesses the performance of these models on unseen datasets (such as the AllCCS2 library) to evaluate their generalization ability. 3. **Exploration of Molecular Structure Diversity Impact**: - It analyzes the impact of molecular structure diversity in the training data on model performance and explores how increasing the number of structurally similar molecules can improve prediction accuracy. 4. **Evaluation of Linear Models and Fingerprint Methods**: - The paper compares the performance of fingerprint-based linear models with existing complex machine learning models to explore the effectiveness of simplified models. 5. **Metabolite Annotation Performance**: - It evaluates the performance of different models in metabolite annotation based on predicted CCS values, particularly their ability to rank and filter out incorrect candidates. Overall, the paper attempts to systematically compare different models to identify the best CCS prediction method and explore how to improve existing models to enhance their application value in practical metabolomics analysis.