pKalculator: A pKa predictor for C-H bonds

Jan Jensen,Rasmus M. Borup,Nicolai Ree
DOI: https://doi.org/10.26434/chemrxiv-2024-56h5h
2024-03-15
Abstract:Determining the pKa values of various C-H sites in organic molecules offers valuable insights for synthetic chemists in predicting reaction sites. As molecular complexity increases, this task becomes more challenging. This paper introduces pKalculator, a quantum chemical (QM)-based workflow for automatic computations of C-H pKa values, which is used to generate a training dataset for a machine learning model (ML). The QM workflow is benchmarked against 695 experimentally determined C-H pKa values. The ML model is trained on a diverse dataset of 775 molecules with 3910 C-H sites. Our ML model predicts C-H pKa values with a mean absolute error (MAE) and a root mean squared error (RMSE) of 1.24 and 2.15 pKa units, respectively. Furthermore, we employ our model on 1043 pKa-dependent reactions (Aldol, Claisen, and Michael) and successfully indicate the reaction sites with a Matthew’s correlation coefficient (MCC) of 0.82.
Chemistry
What problem does this paper attempt to address?
This paper mainly focuses on predicting the acidity (pKa value) of carbon-hydrogen (C-H) bonds in organic molecules. As the complexity of molecules increases, predicting reaction sites becomes a challenge. The researchers have developed a quantum chemistry (QM)-based workflow called pKalculator for the automated calculation of C-H pKa values, and used this data to train machine learning (ML) models. The model was trained on 3,910 C-H sites in 775 molecules, with an average absolute error (MAE) and root mean square error (RMSE) of 1.24 and 2.15 pKa units, respectively, for pKa value prediction. Furthermore, the model successfully identified reaction sites in 1,043 pKa-dependent reactions (such as Aldol, Claisen, and Michael reactions) with a Matthews correlation coefficient (MCC) of 0.82. The paper also compares pKalculator with existing methods, highlighting its higher prediction accuracy and usability.