Abstract:Retention indices are values that characterize the retention of a compound in gas chromatography. In practice, retention indices are often assumed to depend only on the structure of the molecule and the type of the stationary phase, but this approximation is incorrect. This study is devoted to studying the dependence of retention indices on the column heating rate in the linear temperature programming mode, using a large and diverse data set. In the NIST 20 database, most data records are recorded in this mode. For stationary phases based on poly(5%-diphenyl-95%-dimethyl)siloxane (5%-phenyl-PDMS), there is a high proportion of records with heating rates of 10-15 K/min. In practice, such a high heating rate is rarely used and the use of such data may cause errors. A search was made for groups of records that were taken from the same primary source, recorded for the same compound and the same stationary phase, but differing in a heating rate. For each of these groups, the value D, the angular coefficient (slope) of the dependence of the retention index on the heating rate, was calculated. This value can take both positive and negative values. The highest values and the greatest variation of D values are observed for polar stationary phases, but further consideration was performed for 5%-phenyl-PDMS due to its greater practical significance. For these stationary phases, the highest D values are observed for aromatic and polyaromatic molecules; oxygen-containing compounds, on the contrary, exhibit lower D values. Negative D values are observed for many trimethylsilyl derivatives. A data set of D values for 756 molecules was selected and published online. There is almost no correlation between D and the retention index, lipophilicity factor logP, and molecular weight. Significant correlations with the number of cycles, the number of rotatable bonds, and the number of aromatic atoms were observed. Linear equations quantitatively relating the molecular descriptors to the D value were constructed. A number of cycles and halogen atoms were shown to contribute positively to the D value, while a number of oxygen atoms and bonds subject to internal rotation contributed negatively. The strong influence of the values related to the conformational rigidity of molecules and the weak influence of polarity allow us to suppose that the entropic factor has a key influence on the D value. A simple empirical linear equation for estimating the value of D is derived and presented in this study. Several machine learning methods for predicting D are compared. The best results are shown by gradient boosting and a random forest. However, the random forest does not achieve high accuracy in predicting the retention indices themselves.

Predicting Kovats Retention Indices Using Graph Neural Networks

AIRI: Predicting Retention Indices and Their Uncertainties Using Artificial Intelligence

Performance and robustness of small molecule retention time prediction with molecular graph neural networks in industrial drug discovery campaigns

Retention Time Prediction with Message-Passing Neural Networks

Ready‐to‐use Models Built Using a Diverse Set of 266 Aroma Compounds for the Estimation of Gas Chromatographic Retention Indices for the 50%‐Cyanopropylphenyl‐50%‐Dimethylpolysiloxane Stationary Phase

Machine learning to predict retention time of small molecules in nano-HPLC

Retention time prediction for small samples based on integrating molecular representations and adaptive network

Benchmarking Accuracy and Generalizability of Four Graph Neural Networks Using Large In Vitro ADME Datasets from Different Chemical Spaces

Nucleophilicity Prediction Using Graph Neural Networks

Large-scale statistical study of the dependence of retention index on heating rate in temperature-programmed gas chromatography

Transfer learning based on atomic feature extraction for the prediction of experimental ¹³C chemical shifts

The Predicting Study for Chromatographic Retention Index of Saturated Alcohols by MLR and ANN

Critical evaluation of the NIST retention index database reliability with specific examples

Retention time prediction for chromatographic enantioseparation by quantile geometry-enhanced graph neural network

Imatch: A Retention Index Tool for Analysis of Gas Chromatography–mass Spectrometry Data

Improving chemical reaction yield prediction using pre-trained graph neural networks

Complex machine learning model needs complex testing: Examining predictability of molecular binding affinity by a graph neural network

GraphRT: A graph-based deep learning model for predicting the retention time of peptides

Insights into predicting small molecule retention times in liquid chromatography using deep learning

Deep Neural Network Pretrained by Weighted Autoencoders and Transfer Learning for Retention Time Prediction of Small Molecules