Machine learning from quantum chemistry to predict experimental solvent effects on reaction rates

Yunsie Chung,William H. Green
DOI: https://doi.org/10.1039/d3sc05353a
IF: 8.4
2024-01-11
Chemical Science
Abstract:Fast and accurate prediction of solvent effects on reaction rates are crucial for kinetic modeling, chemical process design, and high-throughput solvent screening. Despite the recent advance in machine learning, a scarcity of reliable data has hindered the development of predictive models that are generalizable for diverse reactions and solvents. In this work, we generate a large set of data with the COSMO-RS method for over 28,000 neutral reactions and 295 solvents and train a machine learning model to predict the solvation free energy and solvation enthalpy of activation (ΔΔG ‡ solv , ΔΔH ‡ solv ) for a solution phase reaction. On unseen reactions, the model achieves mean absolute errors of 0.71 and 1.03 kcal/mol for ΔΔG ‡ solv and ΔΔH ‡ solv , respectively, relative to the COSMO-RS calculations. The model also provides reliable predictions of relative rate constants within a factor of 4 when tested on experimental data. The presented model can provide nearly instantaneous predictions of kinetic solvent effects or relative rate constants for a broad range of neutral closed-shell or free radical reactions and solvents only based on atom-mapped reaction SMILES and solvent SMILES strings.
chemistry, multidisciplinary
What problem does this paper attempt to address?
This paper mainly discusses how to use machine learning to predict the influence of solvents on the reaction rate in chemistry. Currently, although machine learning models can predict the rate constants of gas-phase reactions, models for liquid-phase or solution-phase reactions are limited and rely on quantum mechanical calculations. The authors of the paper generated a large amount of data (over 28,000 neutral reactions and 295 solvents) and trained a machine learning model using the COSMO-RS method to predict the solvation free energy and activation entropy (ΔΔGsolv, ΔΔHsolv) of reactions. The model exhibited high prediction accuracy on unseen reactions and was able to quickly predict relative rate constants based on the 2D structural information of reactions and solvents, such as SMILES strings. In the study, the model was first pre-trained on a large dataset and then fine-tuned on a dataset containing more common reactions. Experimental results showed that the model could provide accurate predictions of relative rate constants at a lower computational cost, applicable to a wide range of neutral closed-shell or radical reactions and solvents. Additionally, the model considered the influence of temperature on the predictions. Finally, the performance of the model was validated using an experimental dataset. In conclusion, this paper addresses how to use machine learning to establish a universal and efficient model for predicting solvent effects in chemical reactions in solutions, thereby promoting kinetic modeling, chemical process design, and high-throughput solvent screening.