基于Uni-Mol的小分子pKa预测
时间:2023年9月22日
推荐镜像:unimol-qsar:v0.2
推荐计算资源:GPU
内容:基于Uni-Mol的小分子pKa预测
使用方式:您可在 Bohrium Notebook 上直接运行。您可以点击界面上方蓝色按钮 开始连接
,选择 unimol-qsar:v0.2
镜像及任何一款节点配置,稍等片刻即可运行。如您遇到任何问题,请联系 bohrium@dp.tech 。
共享协议:本作品采用知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议进行许可。
背景
pKa的定义
解离常数(pKa)是水溶液中具有一定解离度的溶质的极性参数。解离常数给予分子的酸性或碱性以定量的量度,Ka增大,对于质子给予体来说,其酸性增加;Ka减小,对于质子接受体来说,其碱性增加。
pKa的意义
解离常数(pKa)是有机化合物非常重要的性质,决定化合物在介质中的存在形态,进而决定其溶解度、亲脂性、生物富集性以及毒性。对于药物分子,pKa还会影响其药代动力学和生物化学性质。精确预测有机化合物的pKa值在环境化学、生物化学、药物化学以及药物开发等领域都有重要意义。
为什么使用Uni-mol
小分子的酸碱性(pKa)是生物化学中的重要参数,它对于药物设计、生物分子识别和反应性等方面具有重要意义。然而,传统的小分子pKa预测方法主要依赖于1D顺序标记或2D拓扑图表示分子,这限制了它们在考虑3D信息方面的能力。Uni-Mol是一个强大的3D分子表示学习(MRL)框架,已被证明在多种下游任务中表现优异,特别是那些需要正确整合3D信息的任务。
使用Uni-Mol预测小分子的pKa值具有以下优势:
pKa值是评估分子酸碱性质的重要参数,它对于理解和设计分子的物理化学性质至关重要。准确预测pKa可以指导药物和材料设计。
目前大多数pKa预测方法都是基于分子的1D或2D表示,很难捕捉关键的3D结构信息。而Uni-Mol可以直接从3D构象中学习分子性质,是实现更准确pKa预测的有力工具。
Uni-Mol的预训练模型已经学习到丰富的化学知识,可以更好地推广到新分子。这为pKa预测带来了很大便利,不需要大规模的训练数据。
Uni-Mol建立在transformer架构之上,可以端到端地进行多任务微调,将pKa预测与其他分子性质预测任务联合学习,提升各任务的泛化能力。
通过使用Uni-Mol预测小分子的pKa,可以更好地理解分子在生物化学过程中的作用,进而为药物设计、生物分子识别等提供有力支持。例如,在药物设计中,可以根据预测的pKa来调整药物分子的酸碱性,以改善其与受体的结合力、药物的溶解性和透过生物膜的能力等。此外,在生物分子识别过程中,了解小分子的酸碱性有助于预测其与蛋白质之间的相互作用,从而为蛋白质工程和药物靶点筛选提供有益信息。
数据集设置
数据集融合了来自: , , , 的pKa数据。其中,随机选择80%作为训练集,剩余20%作为测试集
ps:本教程是针对新手使用Uni-mol预测pka的demo,因此数据收集的不是很多,效果也不是很好。如果感兴趣pKa预测请关注最先进的pka预测方案
引用
https://baike.baidu.com/item/%E8%A7%A3%E7%A6%BB%E5%B8%B8%E6%95%B0/10919296
Mansouri K, Cariello NF, Korotcov A, Tkachenko V, Grulke CM, Sprankle CS, et al. Open-source QSAR models for pKa prediction using multiple machine learning approaches. J Cheminformatics 2019; 11:60.
Baltruschat M, Czodrowski P. Machine learning meets pKa. F1000Res 2020;9. Chem Inf Sci-113.
I‚ sık M, Levorse D, Rustenburg AS, Ndukwe IE, Wang H, Wang X, et al. pKa measurements for the SAMPL6 prediction challenge for a set of kinase inhibitor-like fragments. J Comput Aid Mol Des 2018;32: 1117e38.
Drug Design Data Resource Community. The SAMPL7 data set. Version 1.1. Available from: https://zenodo.org/record/5637494#. Y0AXD7ZBxsY.
Wu J, Wan Y, Wu Z, et al. MF-SuP-pKa: multi-fidelity modeling with subgraph pooling mechanism for pKa prediction[J]. Acta Pharmaceutica Sinica B, 2023, 13(6): 2572-2584.
Luo W, Zhou G, Zhu Z, Ke G, Wei Z, Gao Z, et al. Uni-pKa: An Accurate and Physically Consistent pKa Prediction through Protonation Ensemble Modeling. ChemRxiv. Cambridge: Cambridge Open Engage; 2023; This content is a preprint and has not been peer-reviewed.
Uni-Mol自带的分子预测预训练模型
unimol 训练参数说明
task:选择对应的任务,目前支持五种任务类型
- classification: 0/1分类
- regression: 回归
- multiclass: 多分类
- multilabel_classification: 多标签0/1分类
- multilabel_regression: 多标签回归
metrics: 对应需要模型优化的指标,传入的metrics可以用逗号分隔,为空默认,目前支持的指标如下:
- classification: auc,auprc,log_loss,f1_score, mcc,recall,precision,cohen_kappa;
- regression: mae, mse, rmse, r2, spearmanr;
- multiclass: log_loss, acc;
- multilabel_classification: log_loss, acc, auprc, cohen_kappa;
- multilabel_regression: mse, mae, rmse, r2;
data_type: 输入的数据类型,目前仅支持molecule,后续会开放protein, crystal等更多数据源;
split: unimol 默认采用5这交叉验证方式,划分方式支持random和按照scaffold划分;
save_path: 当前的任务路径,默认会覆盖文件;
epochs, learning_rate, batch_size, early_stopping: unimol训练时开放的参数;
目前模型训练支持两种方式:
- 给出对应的训练集csv文件路径,需要包含SMILES和TARGET两列;
- 自定义构象方式: 需要传入一个字典,包含对应的atoms和coordinates;
unimol预测参数说明,预测部分第一步是载入训练好的模型,第二步是进行预测;
- load_model:训练好的模型路径;
- 预测部分同样也支持自定义输入构象进行预测;
/opt/conda/lib/python3.8/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
2023-09-22 19:06:17 | unimol/data/datareader.py | 138 | INFO | Uni-Mol(QSAR) | Anomaly clean with 3 sigma threshold: 4858 -> 4846 2023-09-22 19:06:18 | unimol/data/conformer.py | 62 | INFO | Uni-Mol(QSAR) | Start generating conformers... 4846it [00:19, 252.17it/s] 2023-09-22 19:06:37 | unimol/data/conformer.py | 66 | INFO | Uni-Mol(QSAR) | Failed to generate conformers for 0.00% of molecules. 2023-09-22 19:06:37 | unimol/data/conformer.py | 68 | INFO | Uni-Mol(QSAR) | Failed to generate 3d conformers for 0.08% of molecules. 2023-09-22 19:06:37 | unimol/train.py | 88 | INFO | Uni-Mol(QSAR) | Output directory already exists: ./pka 2023-09-22 19:06:37 | unimol/train.py | 89 | INFO | Uni-Mol(QSAR) | Warning: Overwrite output directory: ./pka 2023-09-22 19:06:38 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2023-09-22 19:06:39 | unimol/models/nnmodel.py | 103 | INFO | Uni-Mol(QSAR) | start training Uni-Mol:unimolv1 2023-09-22 19:07:01 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/100] train_loss: 1.0112, val_loss: 0.9428, val_mse: 9.4579, lr: 0.000033, 19.1s 2023-09-22 19:07:13 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/100] train_loss: 0.8811, val_loss: 0.8262, val_mse: 8.2988, lr: 0.000067, 12.0s 2023-09-22 19:07:26 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/100] train_loss: 0.7739, val_loss: 0.7050, val_mse: 7.0498, lr: 0.000100, 11.9s 2023-09-22 19:07:39 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/100] train_loss: 0.6270, val_loss: 0.5219, val_mse: 5.2415, lr: 0.000099, 11.9s 2023-09-22 19:07:51 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/100] train_loss: 0.5208, val_loss: 0.6246, val_mse: 6.2129, lr: 0.000098, 11.8s 2023-09-22 19:08:03 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/100] train_loss: 0.4719, val_loss: 0.5582, val_mse: 5.5032, lr: 0.000097, 11.9s 2023-09-22 19:08:15 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/100] train_loss: 0.4475, val_loss: 0.4835, val_mse: 4.8245, lr: 0.000096, 11.8s 2023-09-22 19:08:28 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/100] train_loss: 0.4073, val_loss: 0.4837, val_mse: 4.8350, lr: 0.000095, 12.0s 2023-09-22 19:08:40 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/100] train_loss: 0.3868, val_loss: 0.4310, val_mse: 4.3347, lr: 0.000094, 11.9s 2023-09-22 19:08:52 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/100] train_loss: 0.3393, val_loss: 0.3843, val_mse: 3.8374, lr: 0.000093, 11.9s 2023-09-22 19:09:05 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [11/100] train_loss: 0.3215, val_loss: 0.4340, val_mse: 4.3170, lr: 0.000092, 12.0s 2023-09-22 19:09:17 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [12/100] train_loss: 0.3152, val_loss: 0.4057, val_mse: 4.0674, lr: 0.000091, 11.8s 2023-09-22 19:09:29 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [13/100] train_loss: 0.2904, val_loss: 0.4150, val_mse: 4.1292, lr: 0.000090, 11.9s 2023-09-22 19:09:41 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [14/100] train_loss: 0.2763, val_loss: 0.4340, val_mse: 4.3483, lr: 0.000089, 11.9s 2023-09-22 19:09:53 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [15/100] train_loss: 0.2616, val_loss: 0.4324, val_mse: 4.3279, lr: 0.000088, 11.9s 2023-09-22 19:10:04 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [16/100] train_loss: 0.2660, val_loss: 0.4182, val_mse: 4.1901, lr: 0.000087, 11.9s 2023-09-22 19:10:16 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [17/100] train_loss: 0.2479, val_loss: 0.4290, val_mse: 4.3007, lr: 0.000086, 11.8s 2023-09-22 19:10:28 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [18/100] train_loss: 0.2425, val_loss: 0.4361, val_mse: 4.3456, lr: 0.000085, 11.7s 2023-09-22 19:10:40 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [19/100] train_loss: 0.2345, val_loss: 0.3985, val_mse: 3.9728, lr: 0.000084, 11.9s 2023-09-22 19:10:52 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [20/100] train_loss: 0.2172, val_loss: 0.4524, val_mse: 4.5049, lr: 0.000082, 11.8s 2023-09-22 19:10:52 | unimol/utils/metrics.py | 228 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 20 2023-09-22 19:10:52 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2023-09-22 19:10:53 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 0, result {'mse': 3.837353, 'mae': 1.3428538, 'pearsonr': 0.796152655102211, 'spearmanr': 0.8045261350951081, 'r2': 0.6254662084813238} 2023-09-22 19:10:54 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2023-09-22 19:11:06 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/100] train_loss: 1.0001, val_loss: 0.8550, val_mse: 8.5821, lr: 0.000033, 11.9s 2023-09-22 19:11:19 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/100] train_loss: 0.8606, val_loss: 0.7202, val_mse: 7.2722, lr: 0.000067, 11.9s 2023-09-22 19:11:31 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/100] train_loss: 0.8098, val_loss: 0.6975, val_mse: 6.9917, lr: 0.000100, 11.9s 2023-09-22 19:11:44 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/100] train_loss: 0.6484, val_loss: 0.5016, val_mse: 5.0634, lr: 0.000099, 11.9s 2023-09-22 19:11:57 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/100] train_loss: 0.5485, val_loss: 0.4420, val_mse: 4.4724, lr: 0.000098, 11.8s 2023-09-22 19:12:09 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/100] train_loss: 0.4884, val_loss: 0.4935, val_mse: 5.0232, lr: 0.000097, 11.9s 2023-09-22 19:12:21 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/100] train_loss: 0.4438, val_loss: 0.4463, val_mse: 4.5404, lr: 0.000096, 11.9s 2023-09-22 19:12:33 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/100] train_loss: 0.4250, val_loss: 0.4067, val_mse: 4.1284, lr: 0.000095, 11.8s 2023-09-22 19:12:46 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/100] train_loss: 0.3972, val_loss: 0.4076, val_mse: 4.1591, lr: 0.000094, 11.9s 2023-09-22 19:12:58 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/100] train_loss: 0.3802, val_loss: 0.4523, val_mse: 4.5871, lr: 0.000093, 11.8s 2023-09-22 19:13:10 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [11/100] train_loss: 0.3349, val_loss: 0.4261, val_mse: 4.3244, lr: 0.000092, 11.9s 2023-09-22 19:13:21 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [12/100] train_loss: 0.3046, val_loss: 0.4421, val_mse: 4.4778, lr: 0.000091, 11.9s 2023-09-22 19:13:33 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [13/100] train_loss: 0.2974, val_loss: 0.3763, val_mse: 3.8307, lr: 0.000090, 11.8s 2023-09-22 19:13:46 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [14/100] train_loss: 0.2772, val_loss: 0.3950, val_mse: 4.0040, lr: 0.000089, 11.9s 2023-09-22 19:13:58 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [15/100] train_loss: 0.2730, val_loss: 0.4242, val_mse: 4.3101, lr: 0.000088, 11.9s 2023-09-22 19:14:10 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [16/100] train_loss: 0.2558, val_loss: 0.4160, val_mse: 4.2408, lr: 0.000087, 12.0s 2023-09-22 19:14:22 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [17/100] train_loss: 0.2567, val_loss: 0.4056, val_mse: 4.1271, lr: 0.000086, 11.9s 2023-09-22 19:14:34 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [18/100] train_loss: 0.2749, val_loss: 0.4220, val_mse: 4.3032, lr: 0.000085, 12.0s 2023-09-22 19:14:46 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [19/100] train_loss: 0.2308, val_loss: 0.4472, val_mse: 4.5531, lr: 0.000084, 12.0s 2023-09-22 19:14:58 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [20/100] train_loss: 0.2274, val_loss: 0.4076, val_mse: 4.1538, lr: 0.000082, 12.0s 2023-09-22 19:15:10 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [21/100] train_loss: 0.2193, val_loss: 0.4107, val_mse: 4.1830, lr: 0.000081, 11.9s 2023-09-22 19:15:22 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [22/100] train_loss: 0.2067, val_loss: 0.4377, val_mse: 4.4658, lr: 0.000080, 11.8s 2023-09-22 19:15:34 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [23/100] train_loss: 0.2061, val_loss: 0.4247, val_mse: 4.3193, lr: 0.000079, 11.8s 2023-09-22 19:15:34 | unimol/utils/metrics.py | 228 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 23 2023-09-22 19:15:34 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2023-09-22 19:15:35 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 1, result {'mse': 3.8306642, 'mae': 1.2842724, 'pearsonr': 0.7984630120060101, 'spearmanr': 0.8129191592212157, 'r2': 0.6125218850491334} 2023-09-22 19:15:36 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2023-09-22 19:15:48 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/100] train_loss: 1.0129, val_loss: 0.8174, val_mse: 8.1871, lr: 0.000033, 11.9s 2023-09-22 19:16:00 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/100] train_loss: 0.9080, val_loss: 1.0782, val_mse: 10.8667, lr: 0.000067, 11.9s 2023-09-22 19:16:12 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/100] train_loss: 0.7499, val_loss: 0.5969, val_mse: 5.9840, lr: 0.000100, 11.9s 2023-09-22 19:16:25 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/100] train_loss: 0.6443, val_loss: 0.5722, val_mse: 5.7672, lr: 0.000099, 11.9s 2023-09-22 19:16:37 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/100] train_loss: 0.5465, val_loss: 0.4717, val_mse: 4.7685, lr: 0.000098, 12.0s 2023-09-22 19:16:50 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/100] train_loss: 0.5190, val_loss: 0.6011, val_mse: 6.1064, lr: 0.000097, 11.9s 2023-09-22 19:17:02 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/100] train_loss: 0.4702, val_loss: 0.4719, val_mse: 4.7694, lr: 0.000096, 11.9s 2023-09-22 19:17:14 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/100] train_loss: 0.4075, val_loss: 0.4831, val_mse: 4.8847, lr: 0.000095, 11.9s 2023-09-22 19:17:26 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/100] train_loss: 0.3899, val_loss: 0.4834, val_mse: 4.8801, lr: 0.000094, 11.9s 2023-09-22 19:17:38 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/100] train_loss: 0.3452, val_loss: 0.5108, val_mse: 5.1400, lr: 0.000093, 12.0s 2023-09-22 19:17:50 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [11/100] train_loss: 0.3351, val_loss: 0.5736, val_mse: 5.8102, lr: 0.000092, 11.9s 2023-09-22 19:18:02 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [12/100] train_loss: 0.3023, val_loss: 0.5212, val_mse: 5.2980, lr: 0.000091, 11.8s 2023-09-22 19:18:14 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [13/100] train_loss: 0.2765, val_loss: 0.4269, val_mse: 4.3433, lr: 0.000090, 11.9s 2023-09-22 19:18:26 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [14/100] train_loss: 0.2752, val_loss: 0.5402, val_mse: 5.4705, lr: 0.000089, 11.8s 2023-09-22 19:18:38 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [15/100] train_loss: 0.2493, val_loss: 0.5041, val_mse: 5.1011, lr: 0.000088, 11.8s 2023-09-22 19:18:50 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [16/100] train_loss: 0.2480, val_loss: 0.4610, val_mse: 4.6911, lr: 0.000087, 11.9s 2023-09-22 19:19:02 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [17/100] train_loss: 0.2248, val_loss: 0.4955, val_mse: 5.0323, lr: 0.000086, 11.8s 2023-09-22 19:19:14 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [18/100] train_loss: 0.2239, val_loss: 0.4587, val_mse: 4.6626, lr: 0.000085, 12.0s 2023-09-22 19:19:25 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [19/100] train_loss: 0.2096, val_loss: 0.4920, val_mse: 5.0106, lr: 0.000084, 11.9s 2023-09-22 19:19:37 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [20/100] train_loss: 0.2117, val_loss: 0.4504, val_mse: 4.5798, lr: 0.000082, 11.9s 2023-09-22 19:19:49 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [21/100] train_loss: 0.2101, val_loss: 0.4585, val_mse: 4.6587, lr: 0.000081, 11.9s 2023-09-22 19:20:01 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [22/100] train_loss: 0.2064, val_loss: 0.5251, val_mse: 5.3517, lr: 0.000080, 11.8s 2023-09-22 19:20:13 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [23/100] train_loss: 0.2128, val_loss: 0.5258, val_mse: 5.3472, lr: 0.000079, 11.9s 2023-09-22 19:20:13 | unimol/utils/metrics.py | 228 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 23 2023-09-22 19:20:13 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2023-09-22 19:20:14 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 2, result {'mse': 4.3433475, 'mae': 1.3079208, 'pearsonr': 0.7468925604362612, 'spearmanr': 0.7679559519480207, 'r2': 0.5203490073181833} 2023-09-22 19:20:15 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2023-09-22 19:20:27 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/100] train_loss: 0.9833, val_loss: 0.8373, val_mse: 8.4903, lr: 0.000033, 11.9s 2023-09-22 19:20:40 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/100] train_loss: 0.8575, val_loss: 0.7316, val_mse: 7.4045, lr: 0.000067, 11.9s 2023-09-22 19:20:52 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/100] train_loss: 0.7661, val_loss: 0.6164, val_mse: 6.2520, lr: 0.000100, 11.8s 2023-09-22 19:21:05 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/100] train_loss: 0.6343, val_loss: 0.5412, val_mse: 5.4598, lr: 0.000099, 12.0s 2023-09-22 19:21:17 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/100] train_loss: 0.5211, val_loss: 0.5810, val_mse: 5.8374, lr: 0.000098, 12.0s 2023-09-22 19:21:29 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/100] train_loss: 0.4703, val_loss: 0.5242, val_mse: 5.2011, lr: 0.000097, 12.0s 2023-09-22 19:21:42 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/100] train_loss: 0.4219, val_loss: 0.6694, val_mse: 6.6685, lr: 0.000096, 11.9s 2023-09-22 19:21:54 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/100] train_loss: 0.4067, val_loss: 0.4979, val_mse: 4.9808, lr: 0.000095, 11.9s 2023-09-22 19:22:07 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/100] train_loss: 0.3842, val_loss: 0.5518, val_mse: 5.4776, lr: 0.000094, 12.0s 2023-09-22 19:22:18 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/100] train_loss: 0.3520, val_loss: 0.6822, val_mse: 6.7683, lr: 0.000093, 11.9s 2023-09-22 19:22:30 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [11/100] train_loss: 0.3337, val_loss: 0.5797, val_mse: 5.7295, lr: 0.000092, 11.9s 2023-09-22 19:22:42 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [12/100] train_loss: 0.2961, val_loss: 0.5321, val_mse: 5.2777, lr: 0.000091, 11.9s 2023-09-22 19:22:54 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [13/100] train_loss: 0.2754, val_loss: 0.5334, val_mse: 5.3071, lr: 0.000090, 12.0s 2023-09-22 19:23:06 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [14/100] train_loss: 0.2778, val_loss: 0.5531, val_mse: 5.5215, lr: 0.000089, 12.0s 2023-09-22 19:23:18 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [15/100] train_loss: 0.2679, val_loss: 0.6056, val_mse: 6.0212, lr: 0.000088, 12.0s 2023-09-22 19:23:30 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [16/100] train_loss: 0.2471, val_loss: 0.5306, val_mse: 5.3044, lr: 0.000087, 12.0s 2023-09-22 19:23:42 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [17/100] train_loss: 0.2550, val_loss: 0.5234, val_mse: 5.2464, lr: 0.000086, 11.9s 2023-09-22 19:23:54 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [18/100] train_loss: 0.2293, val_loss: 0.5260, val_mse: 5.2867, lr: 0.000085, 12.0s 2023-09-22 19:23:54 | unimol/utils/metrics.py | 228 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 18 2023-09-22 19:23:55 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2023-09-22 19:23:57 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 3, result {'mse': 4.980758, 'mae': 1.5096344, 'pearsonr': 0.7281056072950604, 'spearmanr': 0.7481936198904351, 'r2': 0.49684808545844905} 2023-09-22 19:23:57 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2023-09-22 19:24:09 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/100] train_loss: 0.9610, val_loss: 0.9468, val_mse: 9.5041, lr: 0.000033, 11.9s 2023-09-22 19:24:22 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/100] train_loss: 0.8718, val_loss: 0.7699, val_mse: 7.6537, lr: 0.000067, 11.9s 2023-09-22 19:24:35 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/100] train_loss: 0.7561, val_loss: 0.6932, val_mse: 6.7205, lr: 0.000100, 11.9s 2023-09-22 19:24:47 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/100] train_loss: 0.6207, val_loss: 0.6451, val_mse: 6.4776, lr: 0.000099, 12.0s 2023-09-22 19:25:00 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/100] train_loss: 0.5404, val_loss: 0.5107, val_mse: 5.1047, lr: 0.000098, 11.9s 2023-09-22 19:25:12 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/100] train_loss: 0.4820, val_loss: 0.4999, val_mse: 4.9507, lr: 0.000097, 11.9s 2023-09-22 19:25:25 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/100] train_loss: 0.4486, val_loss: 0.4693, val_mse: 4.7448, lr: 0.000096, 12.0s 2023-09-22 19:25:38 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/100] train_loss: 0.3957, val_loss: 0.4628, val_mse: 4.5388, lr: 0.000095, 11.9s 2023-09-22 19:25:50 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/100] train_loss: 0.3686, val_loss: 0.4109, val_mse: 4.0882, lr: 0.000094, 12.0s 2023-09-22 19:26:03 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/100] train_loss: 0.3489, val_loss: 0.3871, val_mse: 3.8486, lr: 0.000093, 11.9s 2023-09-22 19:26:16 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [11/100] train_loss: 0.3346, val_loss: 0.3957, val_mse: 3.9514, lr: 0.000092, 11.9s 2023-09-22 19:26:28 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [12/100] train_loss: 0.2936, val_loss: 0.4204, val_mse: 4.1691, lr: 0.000091, 11.9s 2023-09-22 19:26:39 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [13/100] train_loss: 0.2901, val_loss: 0.4513, val_mse: 4.5146, lr: 0.000090, 11.9s 2023-09-22 19:26:51 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [14/100] train_loss: 0.2619, val_loss: 0.4211, val_mse: 4.1273, lr: 0.000089, 11.9s 2023-09-22 19:27:03 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [15/100] train_loss: 0.2631, val_loss: 0.4851, val_mse: 4.8112, lr: 0.000088, 11.8s 2023-09-22 19:27:15 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [16/100] train_loss: 0.2399, val_loss: 0.4326, val_mse: 4.2765, lr: 0.000087, 11.9s 2023-09-22 19:27:27 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [17/100] train_loss: 0.2403, val_loss: 0.4278, val_mse: 4.2191, lr: 0.000086, 11.9s 2023-09-22 19:27:39 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [18/100] train_loss: 0.2231, val_loss: 0.4175, val_mse: 4.1792, lr: 0.000085, 11.9s 2023-09-22 19:27:51 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [19/100] train_loss: 0.2203, val_loss: 0.4306, val_mse: 4.2084, lr: 0.000084, 12.0s 2023-09-22 19:28:03 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [20/100] train_loss: 0.2208, val_loss: 0.4885, val_mse: 4.7528, lr: 0.000082, 12.0s 2023-09-22 19:28:03 | unimol/utils/metrics.py | 228 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 20 2023-09-22 19:28:03 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2023-09-22 19:28:04 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 4, result {'mse': 3.848559, 'mae': 1.3626456, 'pearsonr': 0.8064467171602484, 'spearmanr': 0.8174274438102153, 'r2': 0.642660186891282} 2023-09-22 19:28:04 | unimol/models/nnmodel.py | 144 | INFO | Uni-Mol(QSAR) | Uni-Mol metrics score: {'mse': 4.16806793096518, 'mae': 1.36146153075709, 'pearsonr': 0.771143726793598, 'spearmanr': 0.7851127062164047, 'r2': 0.5823224163960501} 2023-09-22 19:28:04 | unimol/models/nnmodel.py | 145 | INFO | Uni-Mol(QSAR) | Uni-Mol & Metric result saved!
2023-09-22 19:28:04 | unimol/data/conformer.py | 62 | INFO | Uni-Mol(QSAR) | Start generating conformers... 1215it [00:05, 242.63it/s] 2023-09-22 19:28:09 | unimol/data/conformer.py | 66 | INFO | Uni-Mol(QSAR) | Failed to generate conformers for 0.00% of molecules. 2023-09-22 19:28:10 | unimol/data/conformer.py | 68 | INFO | Uni-Mol(QSAR) | Failed to generate 3d conformers for 0.00% of molecules. 2023-09-22 19:28:10 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2023-09-22 19:28:11 | unimol/models/nnmodel.py | 154 | INFO | Uni-Mol(QSAR) | start predict NNModel:unimolv1 2023-09-22 19:28:11 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2023-09-22 19:28:13 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2023-09-22 19:28:14 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2023-09-22 19:28:16 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2023-09-22 19:28:18 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2023-09-22 19:28:19 | unimol/predict.py | 66 | INFO | Uni-Mol(QSAR) | final predict metrics score: {'mse': 4.096291747485217, 'mae': 1.2155226306436484, 'pearsonr': 0.776462123413637, 'spearmanr': 0.8107734646652588, 'r2': 0.6000364625249144}
总结
通过本次教程读者可以看出用Uni-mol预测pKa的可行性,如果你想进一步提高pKa预测精度:
- 细致调参
- 本次demo收集的数据集较少,请扩充数据集
- 学习Uni-pKa