High-throughput discovery of chemical structure-polarity relationships combining automation and machine-learning techniques
Hao Xu,Jinglong Lin,Qianyi Liu,Yuntian Chen,Jianning Zhang,Yang Yang,Michael C. Young,Yan Xu,Dongxiao Zhang,Fanyang Mo
DOI: https://doi.org/10.1016/j.chempr.2022.08.008
IF: 23.5
2022-12-09
Chem
Abstract:As an essential attribute of organic compounds, polarity has a profound influence on many molecular properties. Thin-layer chromatography (TLC) represents a commonly used technique for empirical polarity estimations. Current TLC techniques need repetitive attempts to obtain suitable development conditions and have low reproducibility due to a low degree of standardization. Herein, we describe an automated system to conduct TLC analysis automatically, facilitating high-throughput collection of a large quantity of experimental data under standardized conditions. Using this dataset, machine-learning (ML) methods are employed to construct surrogate models correlating organic compound structures and their polarity reflected by retardation factor ( R f ). The trained ML models are able to predict the R f value curve of organic compounds in different solvent combinations with high accuracy, thus providing general guidelines for the selection of purification conditions and expediting the generation and analysis of quality TLC data. Introduction Thin-layer chromatography (TLC) is a commonly used technique in modern chemistry and biology laboratories. As a key chromatography technique, the employment of a solid stationary phase and a liquid mobile phase allows for the separation of individual components of a complex mixture on the basis of their relative affinities for the two phases (Figure 1A). 1 Sherma J. Fried B. Handbook of Thin-Layer Chromatography. CRC Press , 2003 Crossref Google Scholar TLC analysis is currently used routinely for reaction monitoring, product identification, and determination of chromatography conditions for subsequent purification. Even though highly experienced synthetic practitioners are able to use this tool, TLC techniques often present a hurdle for scientists in synthesis-adjacent fields. Furthermore, the identification of TLC conditions for new compound classes requires the judicious selection of several variables, most notably the mobile phases and their ratios, to achieve optimal separation. Traditionally, such goals are accomplished through trial-and-error in a time-consuming and labor-intensive manner. Figure 1 Context of the work Show full caption (A) Thin-layer chromatography (TLC) is a chromatography technique used to separate non-volatile mixtures. Synthetic laboratories heavily use TLC techniques to monitor reactions and identify compounds daily. Choosing suitable TLC conditions is usually time-consuming for novices or for new compounds. The retardation factor ( R f ) is the fraction of an analyte in the mobile phase of a chromatographic system. It is defined as the ratio of the distance traveled by the center of a spot to the distance traveled by the solvent front.(B) A sigmoid function is a mathematical function having a characteristic "S"-shaped curve, and it has domain of all real numbers with a return value in the range 0–1. Considering that the R f value also has the same value range, we deliberately associate it with the sigmoid function.(C) The subjective and objective factors of compound R f value measurement. The subjective factors include the compound's structure and other physical properties, as well as elution solvents. The information can be mapped to a vector space via feature engineering and then can be fed to ML algorithms. Other factors like chamber size, humidity, etc., can also affect the measurement. The inuence of these objective factors should be eliminated as much as possible to avoid their impact on model training. View Large Image Figure Viewer Download Hi-res image Download (PPT) In recent years, cutting-edge techniques in artificial intelligence (AI) have revolutionized the extrapolation of structure-property relationships in chemical sciences. 2 Muratov E.N. Bajorath J. Sheridan R.P. Tetko I.V. Filimonov D. Poroikov V. Oprea -Abstract Truncated-
chemistry, multidisciplinary