Abstract:Compound identification is at the center of metabolomics, usually by comparing experimental mass spectra against library spectra. However, most compounds are not commercially available to generate library spectra. Hence, for such compounds, MS/MS spectra need to be predicted. Machine learning and heuristic models have largely failed except for lipids. Here, quantum chemistry software can be used to predict mass spectra. However, quantum chemistry predictions for collision induced dissociation (CID) mass spectra in LC-MS/MS are rare. We present the CIDMD (Collision-Induced Dissociation via Molecular Dynamics) framework to model CID-based MS/MS spectra. It uses first-principles molecular dynamics (MD) to simulate the physical process of molecular collisions in CID tandem mass spectrometry. First, molecular ions are constructed at specific protonation sites. Using density functional theory, these protonated ions are targeted by argon collider gas atoms at user-specified velocities. Subsequent bond breakages are simulated over time for at least 1,000 fs. Each simulation is repeated multiple times from various collisional directions. Fragmentations are accumulated over those repeated collisions to generate CIDMD in silico mass spectra. Twelve small metabolites (<205 Da) were selected to test the accuracy of this framework in comparison to experimental MS/MS spectra. When testing different protomers, collider velocities, number of simulations, simulation time and impact factor b cutoffs, we yielded 261 predicted mass spectra. These in silico spectra resulted in entropy similarity scores of an average 624 ± 189 for all 261 spectra compared to their corresponding experimental spectra, which improved to 828 ± 77 when using optimal parameters of the most probable protomers for 12 molecules. With increasing molecular mass, higher velocities achieved better results. Similarly, different protomers showed large differences in fragmentation; hence, with increasing numbers of protomers and tautomers, the average CIDMD prediction accuracy decreased. Mechanistic details showed that specific fragment ions can be produced from different protomers via multiple fragmentation pathways. We propose that CIDMD is a suitable tool to predict mass spectra of small metabolites like produced by the gut microbiome.

Predicting the Predicted: A Comparison of Machine Learning-Based Collision Cross-Section Prediction Models for Small Molecules

High-Throughput Measurement and Machine Learning-Based Prediction of Collision Cross Sections for Drugs and Drug Metabolites

Predicting Collision Cross-Section Values for Small Molecules through Chemical Class-Based Multimodal Graph Attention Network

A data-driven machine learning approach for electron-molecule ionization cross sections

Deep learning the collisional cross sections of the peptide universe from a million experimental values

Evaluating the generalizability of graph neural networks for predicting collision cross section

Advancing the Prediction of MS/MS Spectra Using Machine Learning

Applications of machine learning in metabolomics: Disease modeling and classification

Predicting Collision-Induced-Dissociation Tandem Mass Spectra (CID-MS/MS) Using Ab Initio Molecular Dynamics

Prediction of Collision Cross-Section Values for Extractables and Leachables from Plastic Products

Prediction of Collision Cross Section Values: Application to Non-Intentionally Added Substance Identification in Food Contact Materials

Machine Learning Small Molecule Properties in Drug Discovery

Deep learning the collisional cross sections of the peptide universe from a million training samples

Accurate Prediction of Ion Mobility Collision Cross-Section Using Ion’s Polarizability and Molecular Mass with Limited Data

Highly Accurate Prediction of NMR Chemical Shifts from Low-Level Quantum Mechanics Calculations Using Machine Learning

Machine learning for molecular simulation

Small molecule machine learning: All models are wrong, some may not even be useful

Combining Machine Learning and Computational Chemistry for Predictive Insights Into Chemical Systems

A community-powered search of machine learning strategy space to find NMR property prediction models