Bohrium
robot
新建

空间站广场

论文
Notebooks
比赛
课程
Apps
我的主页
我的Notebooks
我的论文库
我的足迹

我的工作空间

任务
节点
文件
数据集
镜像
项目
数据库
公开
基于Uni-Mol的小分子pKa预测
Uni-Mol
分子性质预测
Uni-Mol分子性质预测
colin
发布于 2023-09-22
推荐镜像 :Uni-Mol:unimol-qsar:v0.5
推荐机型 :c12_m46_1 * NVIDIA GPU B
赞 5
3
unimol_pKa(v2)

基于Uni-Mol的小分子pKa预测

代码
文本
Open In Bohrium 作者:徐凡杰 📨
时间:2023年9月22日

推荐镜像:unimol-qsar:v0.2
推荐计算资源:GPU
内容:基于Uni-Mol的小分子pKa预测
使用方式:您可在 Bohrium Notebook 上直接运行。您可以点击界面上方蓝色按钮 开始连接,选择 unimol-qsar:v0.2 镜像及任何一款节点配置,稍等片刻即可运行。如您遇到任何问题,请联系 bohrium@dp.tech
共享协议:本作品采用知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议进行许可。

代码
文本

背景

pKa的定义

解离常数(pKa)是水溶液中具有一定解离度的溶质的极性参数。解离常数给予分子的酸性或碱性以定量的量度,Ka增大,对于质子给予体来说,其酸性增加;Ka减小,对于质子接受体来说,其碱性增加。

pKa的意义

解离常数(pKa)是有机化合物非常重要的性质,决定化合物在介质中的存在形态,进而决定其溶解度、亲脂性、生物富集性以及毒性。对于药物分子,pKa还会影响其药代动力学和生物化学性质。精确预测有机化合物的pKa值在环境化学、生物化学、药物化学以及药物开发等领域都有重要意义。

为什么使用Uni-mol

小分子的酸碱性(pKa)是生物化学中的重要参数,它对于药物设计、生物分子识别和反应性等方面具有重要意义。然而,传统的小分子pKa预测方法主要依赖于1D顺序标记或2D拓扑图表示分子,这限制了它们在考虑3D信息方面的能力。Uni-Mol是一个强大的3D分子表示学习(MRL)框架,已被证明在多种下游任务中表现优异,特别是那些需要正确整合3D信息的任务。

使用Uni-Mol预测小分子的pKa值具有以下优势:

  1. pKa值是评估分子酸碱性质的重要参数,它对于理解和设计分子的物理化学性质至关重要。准确预测pKa可以指导药物和材料设计。

  2. 目前大多数pKa预测方法都是基于分子的1D或2D表示,很难捕捉关键的3D结构信息。而Uni-Mol可以直接从3D构象中学习分子性质,是实现更准确pKa预测的有力工具。

  3. Uni-Mol的预训练模型已经学习到丰富的化学知识,可以更好地推广到新分子。这为pKa预测带来了很大便利,不需要大规模的训练数据。

  4. Uni-Mol建立在transformer架构之上,可以端到端地进行多任务微调,将pKa预测与其他分子性质预测任务联合学习,提升各任务的泛化能力。

通过使用Uni-Mol预测小分子的pKa,可以更好地理解分子在生物化学过程中的作用,进而为药物设计、生物分子识别等提供有力支持。例如,在药物设计中,可以根据预测的pKa来调整药物分子的酸碱性,以改善其与受体的结合力、药物的溶解性和透过生物膜的能力等。此外,在生物分子识别过程中,了解小分子的酸碱性有助于预测其与蛋白质之间的相互作用,从而为蛋白质工程和药物靶点筛选提供有益信息。

数据集设置

数据集融合了来自: , , , 的pKa数据。其中,随机选择80%作为训练集,剩余20%作为测试集

ps:本教程是针对新手使用Uni-mol预测pka的demo,因此数据收集的不是很多,效果也不是很好。如果感兴趣pKa预测请关注最先进的pka预测方案

代码
文本

引用

  1. https://baike.baidu.com/item/%E8%A7%A3%E7%A6%BB%E5%B8%B8%E6%95%B0/10919296

  2. Mansouri K, Cariello NF, Korotcov A, Tkachenko V, Grulke CM, Sprankle CS, et al. Open-source QSAR models for pKa prediction using multiple machine learning approaches. J Cheminformatics 2019; 11:60.

  3. Baltruschat M, Czodrowski P. Machine learning meets pKa. F1000Res 2020;9. Chem Inf Sci-113.

  4. I‚ sık M, Levorse D, Rustenburg AS, Ndukwe IE, Wang H, Wang X, et al. pKa measurements for the SAMPL6 prediction challenge for a set of kinase inhibitor-like fragments. J Comput Aid Mol Des 2018;32: 1117e38.

  5. Drug Design Data Resource Community. The SAMPL7 data set. Version 1.1. Available from: https://zenodo.org/record/5637494#. Y0AXD7ZBxsY.

  6. Wu J, Wan Y, Wu Z, et al. MF-SuP-pKa: multi-fidelity modeling with subgraph pooling mechanism for pKa prediction[J]. Acta Pharmaceutica Sinica B, 2023, 13(6): 2572-2584.

  7. Luo W, Zhou G, Zhu Z, Ke G, Wei Z, Gao Z, et al. Uni-pKa: An Accurate and Physically Consistent pKa Prediction through Protonation Ensemble Modeling. ChemRxiv. Cambridge: Cambridge Open Engage; 2023; This content is a preprint and has not been peer-reviewed.

代码
文本

Uni-Mol自带的分子预测预训练模型

unimol 训练参数说明

task:选择对应的任务,目前支持五种任务类型

  • classification: 0/1分类
  • regression: 回归
  • multiclass: 多分类
  • multilabel_classification: 多标签0/1分类
  • multilabel_regression: 多标签回归

metrics: 对应需要模型优化的指标,传入的metrics可以用逗号分隔,为空默认,目前支持的指标如下:

  • classification: auc,auprc,log_loss,f1_score, mcc,recall,precision,cohen_kappa;
  • regression: mae, mse, rmse, r2, spearmanr;
  • multiclass: log_loss, acc;
  • multilabel_classification: log_loss, acc, auprc, cohen_kappa;
  • multilabel_regression: mse, mae, rmse, r2;

data_type: 输入的数据类型,目前仅支持molecule,后续会开放protein, crystal等更多数据源;

split: unimol 默认采用5这交叉验证方式,划分方式支持random和按照scaffold划分;

save_path: 当前的任务路径,默认会覆盖文件;

epochs, learning_rate, batch_size, early_stopping: unimol训练时开放的参数;

目前模型训练支持两种方式:

  • 给出对应的训练集csv文件路径,需要包含SMILES和TARGET两列;
  • 自定义构象方式: 需要传入一个字典,包含对应的atoms和coordinates;

unimol预测参数说明,预测部分第一步是载入训练好的模型,第二步是进行预测;

  • load_model:训练好的模型路径;
  • 预测部分同样也支持自定义输入构象进行预测;
代码
文本
[1]
# 导入unimol
from unimol import MolTrain, MolPredict
/opt/conda/lib/python3.8/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
代码
文本
[2]
### 训练模型初始化
clf = MolTrain(task='regression',
data_type='molecule',
epochs=100,
learning_rate=0.0001,
batch_size=32,
early_stopping=10,
metrics='mse',
split='random',
save_path='./pka',
)
代码
文本
[3]
### 模型训练 - 传入SMILES文件方式
clf.fit('/bohr/pKa-51eu/v2/pKa_train.csv')
2023-09-22 19:06:17 | unimol/data/datareader.py | 138 | INFO | Uni-Mol(QSAR) | Anomaly clean with 3 sigma threshold: 4858 -> 4846
2023-09-22 19:06:18 | unimol/data/conformer.py | 62 | INFO | Uni-Mol(QSAR) | Start generating conformers...
4846it [00:19, 252.17it/s]
2023-09-22 19:06:37 | unimol/data/conformer.py | 66 | INFO | Uni-Mol(QSAR) | Failed to generate conformers for 0.00% of molecules.
2023-09-22 19:06:37 | unimol/data/conformer.py | 68 | INFO | Uni-Mol(QSAR) | Failed to generate 3d conformers for 0.08% of molecules.
2023-09-22 19:06:37 | unimol/train.py | 88 | INFO | Uni-Mol(QSAR) | Output directory already exists: ./pka
2023-09-22 19:06:37 | unimol/train.py | 89 | INFO | Uni-Mol(QSAR) | Warning: Overwrite output directory: ./pka
2023-09-22 19:06:38 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2023-09-22 19:06:39 | unimol/models/nnmodel.py | 103 | INFO | Uni-Mol(QSAR) | start training Uni-Mol:unimolv1
2023-09-22 19:07:01 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/100] train_loss: 1.0112, val_loss: 0.9428, val_mse: 9.4579, lr: 0.000033, 19.1s
2023-09-22 19:07:13 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/100] train_loss: 0.8811, val_loss: 0.8262, val_mse: 8.2988, lr: 0.000067, 12.0s
2023-09-22 19:07:26 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/100] train_loss: 0.7739, val_loss: 0.7050, val_mse: 7.0498, lr: 0.000100, 11.9s
2023-09-22 19:07:39 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/100] train_loss: 0.6270, val_loss: 0.5219, val_mse: 5.2415, lr: 0.000099, 11.9s
2023-09-22 19:07:51 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/100] train_loss: 0.5208, val_loss: 0.6246, val_mse: 6.2129, lr: 0.000098, 11.8s
2023-09-22 19:08:03 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/100] train_loss: 0.4719, val_loss: 0.5582, val_mse: 5.5032, lr: 0.000097, 11.9s
2023-09-22 19:08:15 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/100] train_loss: 0.4475, val_loss: 0.4835, val_mse: 4.8245, lr: 0.000096, 11.8s
2023-09-22 19:08:28 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/100] train_loss: 0.4073, val_loss: 0.4837, val_mse: 4.8350, lr: 0.000095, 12.0s
2023-09-22 19:08:40 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/100] train_loss: 0.3868, val_loss: 0.4310, val_mse: 4.3347, lr: 0.000094, 11.9s
2023-09-22 19:08:52 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/100] train_loss: 0.3393, val_loss: 0.3843, val_mse: 3.8374, lr: 0.000093, 11.9s
2023-09-22 19:09:05 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [11/100] train_loss: 0.3215, val_loss: 0.4340, val_mse: 4.3170, lr: 0.000092, 12.0s
2023-09-22 19:09:17 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [12/100] train_loss: 0.3152, val_loss: 0.4057, val_mse: 4.0674, lr: 0.000091, 11.8s
2023-09-22 19:09:29 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [13/100] train_loss: 0.2904, val_loss: 0.4150, val_mse: 4.1292, lr: 0.000090, 11.9s
2023-09-22 19:09:41 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [14/100] train_loss: 0.2763, val_loss: 0.4340, val_mse: 4.3483, lr: 0.000089, 11.9s
2023-09-22 19:09:53 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [15/100] train_loss: 0.2616, val_loss: 0.4324, val_mse: 4.3279, lr: 0.000088, 11.9s
2023-09-22 19:10:04 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [16/100] train_loss: 0.2660, val_loss: 0.4182, val_mse: 4.1901, lr: 0.000087, 11.9s
2023-09-22 19:10:16 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [17/100] train_loss: 0.2479, val_loss: 0.4290, val_mse: 4.3007, lr: 0.000086, 11.8s
2023-09-22 19:10:28 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [18/100] train_loss: 0.2425, val_loss: 0.4361, val_mse: 4.3456, lr: 0.000085, 11.7s
2023-09-22 19:10:40 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [19/100] train_loss: 0.2345, val_loss: 0.3985, val_mse: 3.9728, lr: 0.000084, 11.9s
2023-09-22 19:10:52 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [20/100] train_loss: 0.2172, val_loss: 0.4524, val_mse: 4.5049, lr: 0.000082, 11.8s
2023-09-22 19:10:52 | unimol/utils/metrics.py | 228 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 20
2023-09-22 19:10:52 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2023-09-22 19:10:53 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 0, result {'mse': 3.837353, 'mae': 1.3428538, 'pearsonr': 0.796152655102211, 'spearmanr': 0.8045261350951081, 'r2': 0.6254662084813238}
2023-09-22 19:10:54 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2023-09-22 19:11:06 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/100] train_loss: 1.0001, val_loss: 0.8550, val_mse: 8.5821, lr: 0.000033, 11.9s
2023-09-22 19:11:19 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/100] train_loss: 0.8606, val_loss: 0.7202, val_mse: 7.2722, lr: 0.000067, 11.9s
2023-09-22 19:11:31 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/100] train_loss: 0.8098, val_loss: 0.6975, val_mse: 6.9917, lr: 0.000100, 11.9s
2023-09-22 19:11:44 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/100] train_loss: 0.6484, val_loss: 0.5016, val_mse: 5.0634, lr: 0.000099, 11.9s
2023-09-22 19:11:57 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/100] train_loss: 0.5485, val_loss: 0.4420, val_mse: 4.4724, lr: 0.000098, 11.8s
2023-09-22 19:12:09 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/100] train_loss: 0.4884, val_loss: 0.4935, val_mse: 5.0232, lr: 0.000097, 11.9s
2023-09-22 19:12:21 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/100] train_loss: 0.4438, val_loss: 0.4463, val_mse: 4.5404, lr: 0.000096, 11.9s
2023-09-22 19:12:33 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/100] train_loss: 0.4250, val_loss: 0.4067, val_mse: 4.1284, lr: 0.000095, 11.8s
2023-09-22 19:12:46 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/100] train_loss: 0.3972, val_loss: 0.4076, val_mse: 4.1591, lr: 0.000094, 11.9s
2023-09-22 19:12:58 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/100] train_loss: 0.3802, val_loss: 0.4523, val_mse: 4.5871, lr: 0.000093, 11.8s
2023-09-22 19:13:10 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [11/100] train_loss: 0.3349, val_loss: 0.4261, val_mse: 4.3244, lr: 0.000092, 11.9s
2023-09-22 19:13:21 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [12/100] train_loss: 0.3046, val_loss: 0.4421, val_mse: 4.4778, lr: 0.000091, 11.9s
2023-09-22 19:13:33 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [13/100] train_loss: 0.2974, val_loss: 0.3763, val_mse: 3.8307, lr: 0.000090, 11.8s
2023-09-22 19:13:46 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [14/100] train_loss: 0.2772, val_loss: 0.3950, val_mse: 4.0040, lr: 0.000089, 11.9s
2023-09-22 19:13:58 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [15/100] train_loss: 0.2730, val_loss: 0.4242, val_mse: 4.3101, lr: 0.000088, 11.9s
2023-09-22 19:14:10 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [16/100] train_loss: 0.2558, val_loss: 0.4160, val_mse: 4.2408, lr: 0.000087, 12.0s
2023-09-22 19:14:22 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [17/100] train_loss: 0.2567, val_loss: 0.4056, val_mse: 4.1271, lr: 0.000086, 11.9s
2023-09-22 19:14:34 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [18/100] train_loss: 0.2749, val_loss: 0.4220, val_mse: 4.3032, lr: 0.000085, 12.0s
2023-09-22 19:14:46 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [19/100] train_loss: 0.2308, val_loss: 0.4472, val_mse: 4.5531, lr: 0.000084, 12.0s
2023-09-22 19:14:58 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [20/100] train_loss: 0.2274, val_loss: 0.4076, val_mse: 4.1538, lr: 0.000082, 12.0s
2023-09-22 19:15:10 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [21/100] train_loss: 0.2193, val_loss: 0.4107, val_mse: 4.1830, lr: 0.000081, 11.9s
2023-09-22 19:15:22 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [22/100] train_loss: 0.2067, val_loss: 0.4377, val_mse: 4.4658, lr: 0.000080, 11.8s
2023-09-22 19:15:34 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [23/100] train_loss: 0.2061, val_loss: 0.4247, val_mse: 4.3193, lr: 0.000079, 11.8s
2023-09-22 19:15:34 | unimol/utils/metrics.py | 228 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 23
2023-09-22 19:15:34 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2023-09-22 19:15:35 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 1, result {'mse': 3.8306642, 'mae': 1.2842724, 'pearsonr': 0.7984630120060101, 'spearmanr': 0.8129191592212157, 'r2': 0.6125218850491334}
2023-09-22 19:15:36 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2023-09-22 19:15:48 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/100] train_loss: 1.0129, val_loss: 0.8174, val_mse: 8.1871, lr: 0.000033, 11.9s
2023-09-22 19:16:00 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/100] train_loss: 0.9080, val_loss: 1.0782, val_mse: 10.8667, lr: 0.000067, 11.9s
2023-09-22 19:16:12 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/100] train_loss: 0.7499, val_loss: 0.5969, val_mse: 5.9840, lr: 0.000100, 11.9s
2023-09-22 19:16:25 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/100] train_loss: 0.6443, val_loss: 0.5722, val_mse: 5.7672, lr: 0.000099, 11.9s
2023-09-22 19:16:37 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/100] train_loss: 0.5465, val_loss: 0.4717, val_mse: 4.7685, lr: 0.000098, 12.0s
2023-09-22 19:16:50 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/100] train_loss: 0.5190, val_loss: 0.6011, val_mse: 6.1064, lr: 0.000097, 11.9s
2023-09-22 19:17:02 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/100] train_loss: 0.4702, val_loss: 0.4719, val_mse: 4.7694, lr: 0.000096, 11.9s
2023-09-22 19:17:14 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/100] train_loss: 0.4075, val_loss: 0.4831, val_mse: 4.8847, lr: 0.000095, 11.9s
2023-09-22 19:17:26 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/100] train_loss: 0.3899, val_loss: 0.4834, val_mse: 4.8801, lr: 0.000094, 11.9s
2023-09-22 19:17:38 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/100] train_loss: 0.3452, val_loss: 0.5108, val_mse: 5.1400, lr: 0.000093, 12.0s
2023-09-22 19:17:50 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [11/100] train_loss: 0.3351, val_loss: 0.5736, val_mse: 5.8102, lr: 0.000092, 11.9s
2023-09-22 19:18:02 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [12/100] train_loss: 0.3023, val_loss: 0.5212, val_mse: 5.2980, lr: 0.000091, 11.8s
2023-09-22 19:18:14 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [13/100] train_loss: 0.2765, val_loss: 0.4269, val_mse: 4.3433, lr: 0.000090, 11.9s
2023-09-22 19:18:26 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [14/100] train_loss: 0.2752, val_loss: 0.5402, val_mse: 5.4705, lr: 0.000089, 11.8s
2023-09-22 19:18:38 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [15/100] train_loss: 0.2493, val_loss: 0.5041, val_mse: 5.1011, lr: 0.000088, 11.8s
2023-09-22 19:18:50 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [16/100] train_loss: 0.2480, val_loss: 0.4610, val_mse: 4.6911, lr: 0.000087, 11.9s
2023-09-22 19:19:02 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [17/100] train_loss: 0.2248, val_loss: 0.4955, val_mse: 5.0323, lr: 0.000086, 11.8s
2023-09-22 19:19:14 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [18/100] train_loss: 0.2239, val_loss: 0.4587, val_mse: 4.6626, lr: 0.000085, 12.0s
2023-09-22 19:19:25 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [19/100] train_loss: 0.2096, val_loss: 0.4920, val_mse: 5.0106, lr: 0.000084, 11.9s
2023-09-22 19:19:37 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [20/100] train_loss: 0.2117, val_loss: 0.4504, val_mse: 4.5798, lr: 0.000082, 11.9s
2023-09-22 19:19:49 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [21/100] train_loss: 0.2101, val_loss: 0.4585, val_mse: 4.6587, lr: 0.000081, 11.9s
2023-09-22 19:20:01 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [22/100] train_loss: 0.2064, val_loss: 0.5251, val_mse: 5.3517, lr: 0.000080, 11.8s
2023-09-22 19:20:13 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [23/100] train_loss: 0.2128, val_loss: 0.5258, val_mse: 5.3472, lr: 0.000079, 11.9s
2023-09-22 19:20:13 | unimol/utils/metrics.py | 228 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 23
2023-09-22 19:20:13 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2023-09-22 19:20:14 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 2, result {'mse': 4.3433475, 'mae': 1.3079208, 'pearsonr': 0.7468925604362612, 'spearmanr': 0.7679559519480207, 'r2': 0.5203490073181833}
2023-09-22 19:20:15 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2023-09-22 19:20:27 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/100] train_loss: 0.9833, val_loss: 0.8373, val_mse: 8.4903, lr: 0.000033, 11.9s
2023-09-22 19:20:40 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/100] train_loss: 0.8575, val_loss: 0.7316, val_mse: 7.4045, lr: 0.000067, 11.9s
2023-09-22 19:20:52 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/100] train_loss: 0.7661, val_loss: 0.6164, val_mse: 6.2520, lr: 0.000100, 11.8s
2023-09-22 19:21:05 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/100] train_loss: 0.6343, val_loss: 0.5412, val_mse: 5.4598, lr: 0.000099, 12.0s
2023-09-22 19:21:17 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/100] train_loss: 0.5211, val_loss: 0.5810, val_mse: 5.8374, lr: 0.000098, 12.0s
2023-09-22 19:21:29 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/100] train_loss: 0.4703, val_loss: 0.5242, val_mse: 5.2011, lr: 0.000097, 12.0s
2023-09-22 19:21:42 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/100] train_loss: 0.4219, val_loss: 0.6694, val_mse: 6.6685, lr: 0.000096, 11.9s
2023-09-22 19:21:54 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/100] train_loss: 0.4067, val_loss: 0.4979, val_mse: 4.9808, lr: 0.000095, 11.9s
2023-09-22 19:22:07 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/100] train_loss: 0.3842, val_loss: 0.5518, val_mse: 5.4776, lr: 0.000094, 12.0s
2023-09-22 19:22:18 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/100] train_loss: 0.3520, val_loss: 0.6822, val_mse: 6.7683, lr: 0.000093, 11.9s
2023-09-22 19:22:30 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [11/100] train_loss: 0.3337, val_loss: 0.5797, val_mse: 5.7295, lr: 0.000092, 11.9s
2023-09-22 19:22:42 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [12/100] train_loss: 0.2961, val_loss: 0.5321, val_mse: 5.2777, lr: 0.000091, 11.9s
2023-09-22 19:22:54 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [13/100] train_loss: 0.2754, val_loss: 0.5334, val_mse: 5.3071, lr: 0.000090, 12.0s
2023-09-22 19:23:06 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [14/100] train_loss: 0.2778, val_loss: 0.5531, val_mse: 5.5215, lr: 0.000089, 12.0s
2023-09-22 19:23:18 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [15/100] train_loss: 0.2679, val_loss: 0.6056, val_mse: 6.0212, lr: 0.000088, 12.0s
2023-09-22 19:23:30 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [16/100] train_loss: 0.2471, val_loss: 0.5306, val_mse: 5.3044, lr: 0.000087, 12.0s
2023-09-22 19:23:42 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [17/100] train_loss: 0.2550, val_loss: 0.5234, val_mse: 5.2464, lr: 0.000086, 11.9s
2023-09-22 19:23:54 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [18/100] train_loss: 0.2293, val_loss: 0.5260, val_mse: 5.2867, lr: 0.000085, 12.0s
2023-09-22 19:23:54 | unimol/utils/metrics.py | 228 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 18
2023-09-22 19:23:55 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2023-09-22 19:23:57 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 3, result {'mse': 4.980758, 'mae': 1.5096344, 'pearsonr': 0.7281056072950604, 'spearmanr': 0.7481936198904351, 'r2': 0.49684808545844905}
2023-09-22 19:23:57 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2023-09-22 19:24:09 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/100] train_loss: 0.9610, val_loss: 0.9468, val_mse: 9.5041, lr: 0.000033, 11.9s
2023-09-22 19:24:22 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/100] train_loss: 0.8718, val_loss: 0.7699, val_mse: 7.6537, lr: 0.000067, 11.9s
2023-09-22 19:24:35 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/100] train_loss: 0.7561, val_loss: 0.6932, val_mse: 6.7205, lr: 0.000100, 11.9s
2023-09-22 19:24:47 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/100] train_loss: 0.6207, val_loss: 0.6451, val_mse: 6.4776, lr: 0.000099, 12.0s
2023-09-22 19:25:00 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/100] train_loss: 0.5404, val_loss: 0.5107, val_mse: 5.1047, lr: 0.000098, 11.9s
2023-09-22 19:25:12 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/100] train_loss: 0.4820, val_loss: 0.4999, val_mse: 4.9507, lr: 0.000097, 11.9s
2023-09-22 19:25:25 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/100] train_loss: 0.4486, val_loss: 0.4693, val_mse: 4.7448, lr: 0.000096, 12.0s
2023-09-22 19:25:38 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/100] train_loss: 0.3957, val_loss: 0.4628, val_mse: 4.5388, lr: 0.000095, 11.9s
2023-09-22 19:25:50 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/100] train_loss: 0.3686, val_loss: 0.4109, val_mse: 4.0882, lr: 0.000094, 12.0s
2023-09-22 19:26:03 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/100] train_loss: 0.3489, val_loss: 0.3871, val_mse: 3.8486, lr: 0.000093, 11.9s
2023-09-22 19:26:16 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [11/100] train_loss: 0.3346, val_loss: 0.3957, val_mse: 3.9514, lr: 0.000092, 11.9s
2023-09-22 19:26:28 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [12/100] train_loss: 0.2936, val_loss: 0.4204, val_mse: 4.1691, lr: 0.000091, 11.9s
2023-09-22 19:26:39 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [13/100] train_loss: 0.2901, val_loss: 0.4513, val_mse: 4.5146, lr: 0.000090, 11.9s
2023-09-22 19:26:51 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [14/100] train_loss: 0.2619, val_loss: 0.4211, val_mse: 4.1273, lr: 0.000089, 11.9s
2023-09-22 19:27:03 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [15/100] train_loss: 0.2631, val_loss: 0.4851, val_mse: 4.8112, lr: 0.000088, 11.8s
2023-09-22 19:27:15 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [16/100] train_loss: 0.2399, val_loss: 0.4326, val_mse: 4.2765, lr: 0.000087, 11.9s
2023-09-22 19:27:27 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [17/100] train_loss: 0.2403, val_loss: 0.4278, val_mse: 4.2191, lr: 0.000086, 11.9s
2023-09-22 19:27:39 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [18/100] train_loss: 0.2231, val_loss: 0.4175, val_mse: 4.1792, lr: 0.000085, 11.9s
2023-09-22 19:27:51 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [19/100] train_loss: 0.2203, val_loss: 0.4306, val_mse: 4.2084, lr: 0.000084, 12.0s
2023-09-22 19:28:03 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [20/100] train_loss: 0.2208, val_loss: 0.4885, val_mse: 4.7528, lr: 0.000082, 12.0s
2023-09-22 19:28:03 | unimol/utils/metrics.py | 228 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 20
2023-09-22 19:28:03 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2023-09-22 19:28:04 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 4, result {'mse': 3.848559, 'mae': 1.3626456, 'pearsonr': 0.8064467171602484, 'spearmanr': 0.8174274438102153, 'r2': 0.642660186891282}
2023-09-22 19:28:04 | unimol/models/nnmodel.py | 144 | INFO | Uni-Mol(QSAR) | Uni-Mol metrics score: 
{'mse': 4.16806793096518, 'mae': 1.36146153075709, 'pearsonr': 0.771143726793598, 'spearmanr': 0.7851127062164047, 'r2': 0.5823224163960501}
2023-09-22 19:28:04 | unimol/models/nnmodel.py | 145 | INFO | Uni-Mol(QSAR) | Uni-Mol & Metric result saved!
代码
文本
[4]
# 基于SMILES的文件输入模式的预测
clf = MolPredict(load_model='./pka')
test_path = '/bohr/pKa-51eu/v2/pKa_test.csv'
test_pred = clf.predict(test_path)
2023-09-22 19:28:04 | unimol/data/conformer.py | 62 | INFO | Uni-Mol(QSAR) | Start generating conformers...
1215it [00:05, 242.63it/s]
2023-09-22 19:28:09 | unimol/data/conformer.py | 66 | INFO | Uni-Mol(QSAR) | Failed to generate conformers for 0.00% of molecules.
2023-09-22 19:28:10 | unimol/data/conformer.py | 68 | INFO | Uni-Mol(QSAR) | Failed to generate 3d conformers for 0.00% of molecules.
2023-09-22 19:28:10 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2023-09-22 19:28:11 | unimol/models/nnmodel.py | 154 | INFO | Uni-Mol(QSAR) | start predict NNModel:unimolv1
2023-09-22 19:28:11 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2023-09-22 19:28:13 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2023-09-22 19:28:14 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2023-09-22 19:28:16 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2023-09-22 19:28:18 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2023-09-22 19:28:19 | unimol/predict.py | 66 | INFO | Uni-Mol(QSAR) | final predict metrics score: 
{'mse': 4.096291747485217, 'mae': 1.2155226306436484, 'pearsonr': 0.776462123413637, 'spearmanr': 0.8107734646652588, 'r2': 0.6000364625249144}
代码
文本
[6]
# 统计数据计算结果并画图

from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

df = pd.read_csv(test_path, header='infer')
test_target = df['TARGET'].values

rmse_test = np.sqrt(mean_squared_error(test_target, test_pred.flatten()))
R2_test = r2_score(test_target, test_pred.flatten())

fig, ax = plt.subplots(figsize=(5, 5), dpi=150)

xmin = min(test_pred.flatten().min(), test_target.min())
xmax = max(test_pred.flatten().max(), test_target.max())
ymin = xmin
ymax = xmax

ax.scatter(test_target, test_pred.flatten(), alpha=0.2, s=10, c='red', label='Test')


ax.text(0.6, 0.11, "RMSE (Test) = " + "%.3f"%(rmse_test), fontsize=10, transform=ax.transAxes)
ax.text(0.6, 0.07, "R$^{2}$ (Test) = " + "%.3f"%(R2_test), fontsize=10, transform=ax.transAxes)

plt.xlim(xmin, xmax) # 设置x轴的范围
plt.ylim(ymin, ymax) # 设置y轴的范围

ax.set_xlabel('target')
ax.set_ylabel('predict')

_ = ax.plot([xmin, xmax], [ymin, ymax], c='k', ls='--')
ax.legend(loc='upper left')

plt.show()

代码
文本

总结

通过本次教程读者可以看出用Uni-mol预测pKa的可行性,如果你想进一步提高pKa预测精度:

  1. 细致调参
  2. 本次demo收集的数据集较少,请扩充数据集
  3. 学习Uni-pKa
代码
文本
[ ]

代码
文本
Uni-Mol
分子性质预测
Uni-Mol分子性质预测
已赞5
本文被以下合集收录
DeepMD
ytchen@szu.edu.cn
更新于 2024-09-04
10 篇14 人关注
114514
ZinHo
更新于 2024-08-13
3 篇0 人关注
推荐阅读
公开
高分子-溶剂体系的模拟
化学信息学LAMMPSRDKit
化学信息学LAMMPSRDKit
yuanweilin
发布于 2023-09-16
1 赞3 转存文件
公开
快速开始ABACUS | 计算钌(Ru)元素单质的晶体结构
中文
中文
donglikun@dp.tech
更新于 2024-07-16