Abstract:Non-targeted analysis provides a comprehensive approach to analyze environmental and biological samples for nearly all chemicals present. One of the main shortcomings of current analytical methods and workflows is that they are unable to provide any quantitative information constituting an important obstacle in understanding environmental fate and human exposure. Herein, we present an in silico quantification method using mahine-learning for chemicals analyzed using electrospray ionization (ESI). We considered three data sets from different instrumental setups: (i) capillary electrophoresis electrospray ionization-mass spectrometry (CE-MS) in positive ionization mode (ESI+), (ii) liquid chromatography quadrupole time-of-flight mass spectrometry (LC-QTOF/MS) in ESI+ and (iii) LC-QTOF/MS in negative ionization mode (ESI−). We developed and applied two different machine-learning algorithms: a random forest (RF) and an artificial neural network (ANN) to predict the relative response factors (RRFs) of different chemicals based on their physicochemical properties. Chemical concentrations can then be calculated by dividing the measured abundance of a chemical, as peak area or peak height, by its corresponding RRF. We evaluated our models and tested their predictive power using 5-fold cross-validation (CV) and y randomization. Both the RF and the ANN models showed great promise in predicting RRFs. However, the accuracy of the predictions was dependent on the data set composition and the experimental setup. For the CE-MS ESI+ data set, the best model predicted measured RRFs with a mean absolute error (MAE) of 0.19 log units and a cross-validation coefficient of determination (Q2) of 0.84 for the testing set. For the LC-QTOF/MS ESI+ data set, the best model predicted measured RRFs with an MAE of 0.32 and a Q2 of 0.40. For the LC-QTOF/MS ESI– data set, the best model predicted measured RRFs with a MAE of 0.50 and a Q2 of 0.20. Our findings suggest that machine-learning algorithms can be used for predicting concentrations of nontargeted chemicals with reasonable uncertainties, especially in ESI+, while the application on ESI– remains a more challenging problem.The Supporting Information is available free of charge at <a class="ext-link" href="/doi/10.1021/acs.jcim.9b01096?goto=supporting-info">https://pubs.acs.org/doi/10.1021/acs.jcim.9b01096</a>.Chemical names and physicochemical descriptors of the chemicals in the CE-MS ESI+ data set (<a class="ext-link" href="/doi/suppl/10.1021/acs.jcim.9b01096/suppl_file/ci9b01096_si_001.xlsx">XLSX</a>)Chemical names and physicochemical descriptors of the chemicals in the LC-QTOF/MS data sets (<a class="ext-link" href="/doi/suppl/10.1021/acs.jcim.9b01096/suppl_file/ci9b01096_si_002.xlsx">XLSX</a>)Information on the design of the algorithms and the optimization of the hyperparameters (<a class="ext-link" href="/doi/suppl/10.1021/acs.jcim.9b01096/suppl_file/ci9b01096_si_003.pdf">PDF</a>)This article has not yet been cited by other publications.

Machine Learning in Complex Organic Mixtures: Applying Domain Knowledge Allows for Meaningful Performance with Small Datasets.

Machine Learning in Complex Organic Mixtures: Applying Domain Knowledge Allows for Meaningful Performance with Small Data Sets

Bridging Chemical Knowledge and Machine Learning for Performance Prediction of Organic Synthesis.

Getting Started with Machine Learning for Experimental Biochemists and Other Molecular Scientists

Machine Learning for Chemistry: Basics and Applications

Navigating with chemometrics and machine learning in chemistry

Application of Machine Learning in Organic Chemistry

State of the Art and Outlook of Data Science and Machine Learning in Organic Chemistry

A survey on Big Data and Machine Learning for Chemistry

Machine learning meets mass spectrometry: a focused perspective

Applications of Machine Learning to In Silico Quantification of Chemicals without Analytical Standards

Machine Learning of Molecular Electronic Properties in Chemical Compound Space

Leveraging our Teacher’s Experience to Improve Machine Learning: Application to pKa Prediction

Improved Prediction of Carbonless NMR Spectra by the Machine Learning of Theoretical and Fragment Descriptors for Environmental Mixture Analysis

When machine learning meets molecular synthesis

Exploring Machine Learning Applications in Chemical Production through Valorization of Biomass, Plastics, and Petroleum Resources: A Comprehensive Review

Machine Learning Small Molecule Properties in Drug Discovery

Recent Developments in Machine Learning for Mass Spectrometry

Deep learning and artificial intelligence methods for Raman and surface-enhanced Raman scattering

Machine learning strategies to tackle data challenges in mass spectrometry-based proteomics

Multi Analyte Concentration Analysis of Marine Samples Through Regression Based Machine Learning