Retention time prediction of polycyclic aromatic hydrocarbons in gas chromatography–mass spectrometry using QSPR based on random forests and artificial neural network

Moona Emrarian,Mahmoud Reza Sohrabi,Nasser Goudarzi,Fariba Tadayon
DOI: https://doi.org/10.1007/s11224-020-01614-9
2020-08-22
Structural Chemistry
Abstract:In this study, a quantitative structure–property relationship (QSPR) was proposed using the random forests (RF) and artificial neural network (ANN) for determining the retention time (RT) of 123 polycyclic aromatic hydrocarbons (PAHs) in tire fire products. The data containing the RT of the PAHs were obtained by gas chromatography–mass spectrometry (GC–MS) method. The optimum number of trees (<i>n</i><sub>t</sub>) and the number of randomly selected variables to split each node (<i>m</i>) related to the RF model were found 100 and 101, respectively. Different algorithms of ANN were studied, and Levenberg–Marquardt (LM) algorithm with a minimum mean square error (MSE) was selected as the best algorithm. Also, leave-one-out cross-validation (LOOCV), including correlation determination (<i>R</i><sup>2</sup>), standard error of prediction (SEP), mean absolute error (MAE), relative error prediction (REP), mean squared error (MSE), mean relative error (MRE), and predicted residual sum of squares (PRESS), was applied to investigate the validity of statistical models. The <i>R</i><sup>2</sup> of the RF, stepwise regression artificial neural network (SR-ANN), and RF-ANN models were achieved 0.985, 0.983, and 0.959, respectively. An effective model, which does not require experimental steps, was proposed for the RT prediction of new compounds.
chemistry, multidisciplinary, physical,crystallography
What problem does this paper attempt to address?