Machine learning from quantum chemistry to predict experimental solvent effects on reaction rates

Yunsie Chung,William H. Green
DOI: https://doi.org/10.26434/chemrxiv-2023-f20bg-v3
2024-01-10
Abstract:Fast and accurate prediction of solvent effects on reaction rates are crucial for kinetic modeling, chemical process design, and high-throughput solvent screening. Despite the recent advance in machine learning, a scarcity of reliable data has hindered the development of predictive models that are generalizable for diverse reactions and solvents. In this work, we generate a large set of data with the COSMO-RS method for over 28,000 neutral reactions and 295 solvents and train a machine learning model to predict the solvation free energy and solvation enthalpy of activation (ΔΔG‡solv, ΔΔH‡solv) for a solution phase reaction. On unseen reactions, the model achieves mean absolute errors of 0.71 and 1.03 kcal/mol for ΔΔG‡solv and ΔΔH‡solv, respectively, relative to the COSMO-RS calculations. The model also provides reliable predictions of relative rate constants within a factor of 4 when tested on experimental data. The presented model can provide nearly instantaneous predictions of kinetic solvent effects or relative rate constants for a broad range of neutral closed-shell or free radical reactions and solvents only based on atom-mapped reaction SMILES and solvent SMILES strings.
Chemistry
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to predict the influence of solvent effects on reaction rates quickly and accurately. Specifically, the authors generated a large - scale data set, which contains more than 28,000 neutral reactions and 295 solvents, and used these data to train a machine - learning model to predict the changes in activation free energy and activation enthalpy (\(\Delta \Delta G^\ddagger_{\text{solv}}\) and \(\Delta \Delta H^\ddagger_{\text{solv}}\)) of solution - phase reactions. These predictions are crucial for kinetic modeling, chemical process design, and high - throughput solvent screening. Through this method, the model can achieve high prediction accuracy on unseen reactions, while providing reliable predictions of relative rate constants, with an error of no more than 4 times in the experimental data test. The core objective of the paper is to develop a model that can make almost instantaneous predictions of solvent effects for a wide range of neutral closed - shell or radical reactions and solvents based on atom - mapped reaction SMILES and solvent SMILES strings. This not only improves the prediction speed, but also reduces the computational cost, making large - scale automated reaction rate estimation possible.