Holistic Prediction of the Pka in Diverse Solvents Based on a Machine‐Learning Approach

Qi Yang,Yao Li,Jin-Dong Yang,Yidi Liu,Long Zhang,Sanzhong Luo,Jin-Pei Cheng
DOI: https://doi.org/10.1002/anie.202008528
2020-01-01
Angewandte Chemie
Abstract:While many approaches to predict aqueous pK(a)values exist, the fast and accurate prediction of non-aqueous pK(a)values is still challenging. Based on the iBonD experimental pK(a)database (39 solvents), a holistic pK(a)prediction model was established using machine learning. Structural and physical-organic-parameter-based descriptors (SPOC) were introduced to represent the electronic and structural features of the molecules. The models trained with a neural network or the XGBoost algorithm showed the best prediction performance with a low MAE value of 0.87 pK(a)units. The approach allows a comprehensive mapping of all possible pK(a)correlations between different solvents and it was validated by predicting the aqueous pK(a)and micro-pK(a)of pharmaceutical molecules and pK(a)values of organocatalysts in DMSO and MeCN with high accuracy. An online prediction platform was constructed based on the current model, which can provide pK(a)prediction for different types of X-H acidity in the most commonly used solvents.
What problem does this paper attempt to address?