Abstract:As an essential attribute of organic compounds, polarity has a profound influence on many molecular properties. Thin-layer chromatography (TLC) represents a commonly used technique for empirical polarity estimations. Current TLC techniques need repetitive attempts to obtain suitable development conditions and have low reproducibility due to a low degree of standardization. Herein, we describe an automated system to conduct TLC analysis automatically, facilitating high-throughput collection of a large quantity of experimental data under standardized conditions. Using this dataset, machine-learning (ML) methods are employed to construct surrogate models correlating organic compound structures and their polarity reflected by retardation factor ( R f ). The trained ML models are able to predict the R f value curve of organic compounds in different solvent combinations with high accuracy, thus providing general guidelines for the selection of purification conditions and expediting the generation and analysis of quality TLC data. Introduction Thin-layer chromatography (TLC) is a commonly used technique in modern chemistry and biology laboratories. As a key chromatography technique, the employment of a solid stationary phase and a liquid mobile phase allows for the separation of individual components of a complex mixture on the basis of their relative affinities for the two phases (Figure 1A). 1 Sherma J. Fried B. Handbook of Thin-Layer Chromatography. CRC Press , 2003 Crossref Google Scholar TLC analysis is currently used routinely for reaction monitoring, product identification, and determination of chromatography conditions for subsequent purification. Even though highly experienced synthetic practitioners are able to use this tool, TLC techniques often present a hurdle for scientists in synthesis-adjacent fields. Furthermore, the identification of TLC conditions for new compound classes requires the judicious selection of several variables, most notably the mobile phases and their ratios, to achieve optimal separation. Traditionally, such goals are accomplished through trial-and-error in a time-consuming and labor-intensive manner. Figure 1 Context of the work Show full caption (A) Thin-layer chromatography (TLC) is a chromatography technique used to separate non-volatile mixtures. Synthetic laboratories heavily use TLC techniques to monitor reactions and identify compounds daily. Choosing suitable TLC conditions is usually time-consuming for novices or for new compounds. The retardation factor ( R f ) is the fraction of an analyte in the mobile phase of a chromatographic system. It is defined as the ratio of the distance traveled by the center of a spot to the distance traveled by the solvent front.(B) A sigmoid function is a mathematical function having a characteristic "S"-shaped curve, and it has domain of all real numbers with a return value in the range 0–1. Considering that the R f value also has the same value range, we deliberately associate it with the sigmoid function.(C) The subjective and objective factors of compound R f value measurement. The subjective factors include the compound's structure and other physical properties, as well as elution solvents. The information can be mapped to a vector space via feature engineering and then can be fed to ML algorithms. Other factors like chamber size, humidity, etc., can also affect the measurement. The inuence of these objective factors should be eliminated as much as possible to avoid their impact on model training. View Large Image Figure Viewer Download Hi-res image Download (PPT) In recent years, cutting-edge techniques in artificial intelligence (AI) have revolutionized the extrapolation of structure-property relationships in chemical sciences. 2 Muratov E.N. Bajorath J. Sheridan R.P. Tetko I.V. Filimonov D. Poroikov V. Oprea -Abstract Truncated-

Exploring the Chemical Subspace of RPLC: a Data Driven Approach

Exploring the Chemical Subspace of RPLC: a Data Driven Approach

Quantitative Structure Retention-Relationship Modeling: Towards an Innovative General-Purpose Strategy

Integration of Transferable Prediction of Retention Index and Universal Library Search Enhances Exposome Identification Probability in RPLC/HRMS-Based Non-Targeted Analysis

Structure Driven Prediction of Chromatographic Retention Times: Applications to Pharmaceutical Analysis

A Multi-Label Classifier for Predicting the Most Appropriate Instrumental Method for the Analysis of Contaminants of Emerging Concern

Estimating LoD-s Based on the Ionization Efficiency Values for the Reporting and Harmonization of Amenable Chemical Space in Nontargeted Screening LC/ESI/HRMS

Investigating the chemical space coverage of multiple chromatographic and ionization methods using non-targeted analysis on surface and drinking water collected using passive sampling

The application of chemical similarity measures in an unconventional modeling framework c-RASAR along with dimensionality reduction techniques to a representative hepatotoxicity dataset

From chemical similarity measures to an unconventional modeling framework: The application of c-RASAR along with dimensionality reduction techniques in a representative hepatotoxicity dataset

Enhancing compound confidence in suspect and non-target screening through machine learning-based retention time prediction

Systematic Approaches for the Encoding of Chemical Groups: A Case Study

Finding features - variable extraction strategies for dimensionality reduction and marker compounds identification in GC-IMS data

Changes in the Cis-Trans Isomer Selectivity of a Reversed-Phase Liquid Chromatography Column During use with Acidic Mobile Phase Conditions

Kernel-Based, Partial Least Squares Quantitative Structure-Retention Relationship Model for UPLC Retention Time Prediction: A Useful Tool for Metabolite Identification

High-throughput discovery of chemical structure-polarity relationships combining automation and machine-learning techniques

An actionable annotation scoring framework for gas chromatography-high-resolution mass spectrometry

Improved hydrophobic subtraction model of reversed-phase liquid chromatography selectivity based on a large dataset with a focus on isomer selectivity

Controlled exploration of chemical space by machine learning of coarse-grained representations

Grouping of complex substances using analytical chemistry data: A framework for quantitative evaluation and visualization

Multiple testing issues in discriminating compound-related peaks and chromatograms from high frequency noise, spikes and solvent-based noise in LC-MS data sets