MF-SuP-pKa: Multi-fidelity modeling with subgraph pooling mechanism for pKa prediction

Jialu Wu,Yue Wan,Zhenxing Wu,Shengyu Zhang,Dongsheng Cao,Chang-Yu Hsieh,Tingjun Hou
DOI: https://doi.org/10.1016/j.apsb.2022.11.010
IF: 14.903
2022-11-01
Acta Pharmaceutica Sinica B
Abstract:Acid-base dissociation constant (pK a) is a key physicochemical parameter in chemical science, especially in organic synthesis and drug discovery. Current methodologies for pK a prediction still suffer from limited applicability domain and lack of chemical insight. Here we present MF-SuP-pK a (multi-fidelity modeling with subgraph pooling for pK a prediction), a novel pK a prediction model that utilizes subgraph pooling, multi-fidelity learning and data augmentation. In our model, a knowledge-aware subgraph pooling strategy was designed to capture the local and global environments around the ionization sites for micro-pK a prediction. To overcome the scarcity of accurate pK a data, low-fidelity data (computational pK a) was used to fit the high-fidelity data (experimental pK a) through transfer learning. The final MF-SuP-pK a model was constructed by pre-training on the augmented ChEMBL data set and fine-tuning on the DataWarrior data set. Extensive evaluation on the DataWarrior data set and three benchmark data sets shows that MF-SuP-pK a achieves superior performances to the state-of-the-art pK a prediction models while requires much less high-fidelity training data. Compared with Attentive FP, MF-SuP-pK a achieves 23.83% and 20.12% improvement in terms of mean absolute error (MAE) on the acidic and basic sets, respectively.
pharmacology & pharmacy
What problem does this paper attempt to address?