Bohrium
robot
新建

空间站广场

论文
Notebooks
比赛
课程
Apps
我的主页
我的Notebooks
我的论文库
我的足迹

我的工作空间

任务
节点
文件
数据集
镜像
项目
数据库
公开
Molecular Property Prediction based on Uni-Mol
Uni-Mol
Deep Learning
AI4SCUP-OLED
Uni-MolDeep LearningAI4SCUP-OLED
Yani Guan
更新于 2024-10-24
推荐镜像 :Uni-Mol:unimol-qsar:v0.5
推荐机型 :c3_m4_1 * NVIDIA T4
Molecular Property Prediction based on Uni-Mol
UniMol Training Parameters
task: Choose the task type; currently, five types are available:
metrics: The metrics that the model should optimize. Multiple metrics can be separated by commas. If left blank, defaults will be used. Supported metrics are as follows:
data_type: Type of input data. Currently, only molecules are supported, but more data sources like proteins and crystals will be added in the future.
split: UniMol uses 5-fold cross-validation by default. The split method supports both random splitting and scaffold-based splitting.
save_path: The path for the current task. By default, files will be overwritten.
epochs, learning_rate, batch_size, early_stopping: Parameters available during UniMol training.
remove_hs: Whether to remove hydrogens from molecules. By default, hydrogens are not removed (removing them significantly reduces memory usage).
Currently, there are two ways to train the model:
UniMol Prediction Parameter Description

Molecular Property Prediction based on Uni-Mol

代码
文本
[1]
# image:unimol-qsar:v0.5, GPU
from unimol import MolTrain, MolPredict
/opt/conda/lib/python3.8/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
代码
文本
[2]
%%bash
# download sample dataset, CNS drug data
rm -rf mol_test.csv
rm -rf mol_train.csv
wget -nv https://bohrium-example.oss-cn-zhangjiakou.aliyuncs.com/unimol-qsar/mol_test.csv
wget -nv https://bohrium-example.oss-cn-zhangjiakou.aliyuncs.com/unimol-qsar/mol_train.csv
2024-07-17 17:46:18 URL:https://bohrium-example.oss-cn-zhangjiakou.aliyuncs.com/unimol-qsar/mol_test.csv [17486/17486] -> "mol_test.csv" [1]
2024-07-17 17:46:18 URL:https://bohrium-example.oss-cn-zhangjiakou.aliyuncs.com/unimol-qsar/mol_train.csv [30600/30600] -> "mol_train.csv" [1]
代码
文本
[3]
# For a single-task scenario, the input file must contain two columns: SMILES and TARGET
# For a multi-task scenario, the input file must contain columns: SMILES and TARGET_XX, where TARGET_XX corresponds to the columns that need to be predicted
!head mol_train.csv
SMILES,TARGET
CC1OC(=O)CC(O)CC(O)CC(O)CCC(O)C(O)CC(=O)CC(O)C(C(O)CC(OC2OC(C)C(O)C(N)C2O)C=CC=CC=CC=CCCC=CC=CC(C)C(O)C1C)C(=O)O,0
NCCCCC(NC(CCc1ccccc1)C(=O)O)C(=O)N2CCCC2C(=O)O,0
c1ccc(cc1)C2=NCC(=O)Nc3c2cc(cc3)[N+](=O)[O],1
CCN(CC)C(C)CN1c2ccccc2Sc3c1cccc3,1
CC(CCCC(C)(C)O)C1CCC2C(=CC=C3CC(O)CC(O)C3=C)CCCC12C,0
c1ccc(cc1)C(c2ccc(cc2)Cl)N3CCN(CC3)CCOCCO,1
CCCSc1ccc2[nH]c(NC(=O)OC)nc2c1,0
NC12CC3CC(CC(C3)C1)C2,0
CC1CCc2cc(F)cc3C(=O)C(=CN1c23)C(=O)O,0
代码
文本

UniMol Training Parameters

task: Choose the task type; currently, five types are available:

  • classification: 0/1 classification
  • regression: regression
  • multiclass: multi-class classification
  • multilabel_classification: multi-label 0/1 classification
  • multilabel_regression: multi-label 0/1 regression

metrics: The metrics that the model should optimize. Multiple metrics can be separated by commas. If left blank, defaults will be used. Supported metrics are as follows:

  • classification: auc, auprc, log_loss, f1_score, mcc, recall, precision, cohen_kappa;
  • regression: mae, mse, rmse, r2, spearmanr;
  • multiclass: log_loss, acc;
  • multilabel_classification: log_loss, acc, auprc, cohen_kappa;
  • multilabel_regression: mse, mae, rmse, r2;

data_type: Type of input data. Currently, only molecules are supported, but more data sources like proteins and crystals will be added in the future.

split: UniMol uses 5-fold cross-validation by default. The split method supports both random splitting and scaffold-based splitting.

save_path: The path for the current task. By default, files will be overwritten.

epochs, learning_rate, batch_size, early_stopping: Parameters available during UniMol training.

remove_hs: Whether to remove hydrogens from molecules. By default, hydrogens are not removed (removing them significantly reduces memory usage).

代码
文本
[4]
### training model initialization
clf = MolTrain(task='classification',
data_type='molecule',
epochs=20,
learning_rate=0.0001,
batch_size=16,
early_stopping=5,
metrics='auc',
split='random',
save_path='./exp',
remove_hs=True,
)
代码
文本

Currently, there are two ways to train the model:

  • Provide the corresponding csv file path for the training set, which must contain two columns: SMILES and TARGET;
  • Custom conformation method: You need to pass a dictionary that contains atoms and coordinates.
代码
文本
[5]
### model training - in the way of importing SMILES file
clf.fit('mol_train.csv')
2024-07-17 17:46:38 | unimol/data/conformer.py | 62 | INFO | Uni-Mol(QSAR) | Start generating conformers...
700it [00:15, 45.53it/s]
2024-07-17 17:46:53 | unimol/data/conformer.py | 66 | INFO | Uni-Mol(QSAR) | Failed to generate conformers for 0.00% of molecules.
2024-07-17 17:46:53 | unimol/data/conformer.py | 68 | INFO | Uni-Mol(QSAR) | Failed to generate 3d conformers for 0.00% of molecules.
2024-07-17 17:46:53 | unimol/train.py | 102 | INFO | Uni-Mol(QSAR) | Create output directory: ./exp
2024-07-17 17:46:54 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_no_h_220816.pt
2024-07-17 17:46:55 | unimol/models/nnmodel.py | 103 | INFO | Uni-Mol(QSAR) | start training Uni-Mol:unimolv1
2024-07-17 17:47:11 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/20] train_loss: 0.6512, val_loss: 0.4486, val_auc: 0.8720, lr: 0.000098, 11.6s
2024-07-17 17:47:17 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/20] train_loss: 0.4759, val_loss: 0.4393, val_auc: 0.9088, lr: 0.000093, 5.5s
2024-07-17 17:47:22 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/20] train_loss: 0.4232, val_loss: 0.3040, val_auc: 0.9368, lr: 0.000088, 4.5s
2024-07-17 17:47:26 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/20] train_loss: 0.3154, val_loss: 0.2988, val_auc: 0.9322, lr: 0.000082, 4.5s
2024-07-17 17:47:31 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/20] train_loss: 0.2268, val_loss: 0.3729, val_auc: 0.9251, lr: 0.000077, 4.4s
2024-07-17 17:47:35 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/20] train_loss: 0.2273, val_loss: 0.5764, val_auc: 0.9191, lr: 0.000072, 4.4s
2024-07-17 17:47:40 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/20] train_loss: 0.1298, val_loss: 0.4541, val_auc: 0.9310, lr: 0.000067, 4.3s
2024-07-17 17:47:44 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/20] train_loss: 0.0938, val_loss: 0.6080, val_auc: 0.9088, lr: 0.000062, 4.5s
2024-07-17 17:47:44 | unimol/utils/metrics.py | 243 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 8
2024-07-17 17:47:44 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-07-17 17:47:44 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 0, result {'auc': 0.9368421052631579, 'auroc': 0.9368421052631579, 'auprc': 0.8823329417709477, 'log_loss': 0.3080171648279897, 'acc': 0.8785714285714286, 'f1_score': 0.8089887640449439, 'mcc': 0.7200976807742538, 'precision': 0.8181818181818182, 'recall': 0.8, 'cohen_kappa': 0.72}
2024-07-17 17:47:45 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_no_h_220816.pt
2024-07-17 17:47:50 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/20] train_loss: 0.5967, val_loss: 0.4289, val_auc: 0.8760, lr: 0.000098, 4.3s
2024-07-17 17:47:54 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/20] train_loss: 0.4246, val_loss: 0.3442, val_auc: 0.9160, lr: 0.000093, 4.4s
2024-07-17 17:47:59 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/20] train_loss: 0.3502, val_loss: 0.6515, val_auc: 0.9067, lr: 0.000088, 4.3s
2024-07-17 17:48:03 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/20] train_loss: 0.3070, val_loss: 0.3463, val_auc: 0.9333, lr: 0.000082, 4.2s
2024-07-17 17:48:08 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/20] train_loss: 0.2769, val_loss: 0.4444, val_auc: 0.9284, lr: 0.000077, 4.3s
2024-07-17 17:48:12 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/20] train_loss: 0.3168, val_loss: 0.3439, val_auc: 0.9324, lr: 0.000072, 4.2s
2024-07-17 17:48:16 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/20] train_loss: 0.2651, val_loss: 0.7065, val_auc: 0.9080, lr: 0.000067, 4.3s
2024-07-17 17:48:21 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/20] train_loss: 0.2476, val_loss: 0.4660, val_auc: 0.9102, lr: 0.000062, 4.3s
2024-07-17 17:48:26 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/20] train_loss: 0.1318, val_loss: 0.7787, val_auc: 0.9127, lr: 0.000057, 5.0s
2024-07-17 17:48:26 | unimol/utils/metrics.py | 243 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 9
2024-07-17 17:48:26 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-07-17 17:48:26 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 1, result {'auc': 0.9333333333333333, 'auroc': 0.9333333333333333, 'auprc': 0.8867574781831005, 'log_loss': 0.35297432385518085, 'acc': 0.8928571428571429, 'f1_score': 0.8571428571428572, 'mcc': 0.7739827625318121, 'precision': 0.8181818181818182, 'recall': 0.9, 'cohen_kappa': 0.7717391304347826}
2024-07-17 17:48:27 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_no_h_220816.pt
2024-07-17 17:48:32 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/20] train_loss: 0.6556, val_loss: 0.4963, val_auc: 0.8521, lr: 0.000098, 4.4s
2024-07-17 17:48:36 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/20] train_loss: 0.4495, val_loss: 0.3652, val_auc: 0.9090, lr: 0.000093, 4.4s
2024-07-17 17:48:42 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/20] train_loss: 0.3517, val_loss: 0.6135, val_auc: 0.8999, lr: 0.000088, 4.3s
2024-07-17 17:48:47 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/20] train_loss: 0.2684, val_loss: 0.4570, val_auc: 0.9210, lr: 0.000082, 4.5s
2024-07-17 17:48:52 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/20] train_loss: 0.1819, val_loss: 0.4764, val_auc: 0.9103, lr: 0.000077, 5.1s
2024-07-17 17:48:56 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/20] train_loss: 0.1614, val_loss: 0.5266, val_auc: 0.9130, lr: 0.000072, 4.3s
2024-07-17 17:49:01 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/20] train_loss: 0.0810, val_loss: 0.6567, val_auc: 0.9099, lr: 0.000067, 4.4s
2024-07-17 17:49:05 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/20] train_loss: 0.0690, val_loss: 0.6392, val_auc: 0.9035, lr: 0.000062, 4.3s
2024-07-17 17:49:09 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/20] train_loss: 0.0651, val_loss: 0.8091, val_auc: 0.9069, lr: 0.000057, 4.4s
2024-07-17 17:49:09 | unimol/utils/metrics.py | 243 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 9
2024-07-17 17:49:10 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-07-17 17:49:10 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 2, result {'auc': 0.9209692028985508, 'auroc': 0.9209692028985508, 'auprc': 0.881805333111016, 'log_loss': 0.44749031981586346, 'acc': 0.8285714285714286, 'f1_score': 0.7000000000000001, 'mcc': 0.6102458740855465, 'precision': 0.875, 'recall': 0.5833333333333334, 'cohen_kappa': 0.5866141732283465}
2024-07-17 17:49:11 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_no_h_220816.pt
2024-07-17 17:49:15 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/20] train_loss: 0.6369, val_loss: 0.6772, val_auc: 0.7854, lr: 0.000098, 4.4s
2024-07-17 17:49:20 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/20] train_loss: 0.4516, val_loss: 0.4308, val_auc: 0.8713, lr: 0.000093, 4.3s
2024-07-17 17:49:25 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/20] train_loss: 0.3334, val_loss: 0.4203, val_auc: 0.8982, lr: 0.000088, 5.2s
2024-07-17 17:49:31 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/20] train_loss: 0.2943, val_loss: 0.3751, val_auc: 0.9037, lr: 0.000082, 5.1s
2024-07-17 17:49:36 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/20] train_loss: 0.2592, val_loss: 0.4622, val_auc: 0.9086, lr: 0.000077, 4.5s
2024-07-17 17:49:41 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/20] train_loss: 0.2222, val_loss: 0.4553, val_auc: 0.9064, lr: 0.000072, 4.6s
2024-07-17 17:49:45 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/20] train_loss: 0.1986, val_loss: 0.5191, val_auc: 0.9216, lr: 0.000067, 4.6s
2024-07-17 17:49:50 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/20] train_loss: 0.1894, val_loss: 0.6887, val_auc: 0.9068, lr: 0.000062, 4.5s
2024-07-17 17:49:54 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/20] train_loss: 0.1149, val_loss: 0.6962, val_auc: 0.8731, lr: 0.000057, 4.4s
2024-07-17 17:49:59 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/20] train_loss: 0.1123, val_loss: 0.6989, val_auc: 0.8954, lr: 0.000052, 4.4s
2024-07-17 17:50:03 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [11/20] train_loss: 0.0718, val_loss: 1.0569, val_auc: 0.8954, lr: 0.000046, 4.4s
2024-07-17 17:50:08 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [12/20] train_loss: 0.0931, val_loss: 0.9347, val_auc: 0.8894, lr: 0.000041, 4.6s
2024-07-17 17:50:08 | unimol/utils/metrics.py | 243 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 12
2024-07-17 17:50:08 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-07-17 17:50:08 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 3, result {'auc': 0.9215686274509804, 'auroc': 0.9215686274509804, 'auprc': 0.8843896223781123, 'log_loss': 0.5236389451177924, 'acc': 0.8642857142857143, 'f1_score': 0.7912087912087913, 'mcc': 0.704062425880932, 'precision': 0.9, 'recall': 0.7058823529411765, 'cohen_kappa': 0.6928406466512702}
2024-07-17 17:50:09 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_no_h_220816.pt
2024-07-17 17:50:14 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/20] train_loss: 0.7003, val_loss: 0.6448, val_auc: 0.6662, lr: 0.000098, 4.3s
2024-07-17 17:50:18 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/20] train_loss: 0.5611, val_loss: 0.5539, val_auc: 0.8098, lr: 0.000093, 4.3s
2024-07-17 17:50:23 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/20] train_loss: 0.4389, val_loss: 0.4325, val_auc: 0.9169, lr: 0.000088, 4.9s
2024-07-17 17:50:28 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/20] train_loss: 0.4042, val_loss: 0.2911, val_auc: 0.9466, lr: 0.000082, 4.4s
2024-07-17 17:50:33 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/20] train_loss: 0.3121, val_loss: 0.3371, val_auc: 0.9393, lr: 0.000077, 4.4s
2024-07-17 17:50:38 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/20] train_loss: 0.2647, val_loss: 0.3885, val_auc: 0.9313, lr: 0.000072, 4.4s
2024-07-17 17:50:42 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/20] train_loss: 0.1881, val_loss: 0.2983, val_auc: 0.9540, lr: 0.000067, 4.3s
2024-07-17 17:50:47 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/20] train_loss: 0.1685, val_loss: 0.4049, val_auc: 0.9399, lr: 0.000062, 4.5s
2024-07-17 17:50:51 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/20] train_loss: 0.1534, val_loss: 0.6540, val_auc: 0.8705, lr: 0.000057, 4.4s
2024-07-17 17:50:56 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/20] train_loss: 0.1108, val_loss: 0.3943, val_auc: 0.9260, lr: 0.000052, 4.4s
2024-07-17 17:51:00 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [11/20] train_loss: 0.0784, val_loss: 0.5554, val_auc: 0.9109, lr: 0.000046, 4.5s
2024-07-17 17:51:05 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [12/20] train_loss: 0.0742, val_loss: 0.3906, val_auc: 0.9260, lr: 0.000041, 4.4s
2024-07-17 17:51:05 | unimol/utils/metrics.py | 243 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 12
2024-07-17 17:51:05 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-07-17 17:51:05 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 4, result {'auc': 0.9540229885057472, 'auroc': 0.9540229885057472, 'auprc': 0.9278865030074934, 'log_loss': 0.2957740822673908, 'acc': 0.9285714285714286, 'f1_score': 0.9019607843137256, 'mcc': 0.8475280627789067, 'precision': 0.9387755102040817, 'recall': 0.8679245283018868, 'cohen_kappa': 0.8459167950693375}
2024-07-17 17:51:05 | unimol/models/nnmodel.py | 144 | INFO | Uni-Mol(QSAR) | Uni-Mol metrics score: 
{'auc': 0.9139787829226659, 'auroc': 0.9139787829226659, 'auprc': 0.8727395178107266, 'log_loss': 0.3855789671768434, 'acc': 0.8785714285714286, 'f1_score': 0.8179871520342612, 'mcc': 0.7300828090459858, 'precision': 0.8681818181818182, 'recall': 0.7732793522267206, 'cohen_kappa': 0.7273393822747686}
2024-07-17 17:51:05 | unimol/models/nnmodel.py | 145 | INFO | Uni-Mol(QSAR) | Uni-Mol & Metric result saved!
2024-07-17 17:51:05 | unimol/utils/metrics.py | 260 | INFO | Uni-Mol(QSAR) | metrics for threshold: accuracy_score
2024-07-17 17:51:05 | unimol/utils/metrics.py | 274 | INFO | Uni-Mol(QSAR) | best threshold: 0.367564390756582, metrics: 0.88
代码
文本
[6]
import pandas as pd
from sklearn.metrics import roc_auc_score
cv_results = pd.DataFrame({'pred':clf.cv_pred.flatten(),
'smiles':clf.data['smiles'],
'target':clf.data['target'].flatten()})
print(cv_results.head())
print("cross-validation result:",roc_auc_score(cv_results.target, cv_results.pred))
       pred                                             smiles  target
0  0.006084  CC1OC(=O)CC(O)CC(O)CC(O)CCC(O)C(O)CC(=O)CC(O)C...       0
1  0.009041     NCCCCC(NC(CCc1ccccc1)C(=O)O)C(=O)N2CCCC2C(=O)O       0
2  0.891787        c1ccc(cc1)C2=NCC(=O)Nc3c2cc(cc3)[N+](=O)[O]       1
3  0.088636                   CCN(CC)C(C)CN1c2ccccc2Sc3c1cccc3       1
4  0.007430  CC(CCCC(C)(C)O)C1CCC2C(=CC=C3CC(O)CC(O)C3=C)CC...       0
交叉验证结果: 0.9139787829226659
代码
文本

UniMol Prediction Parameter Description

The prediction process involves two steps: first, load the trained model, and second, make predictions;

  • load_model: Path to the trained model;
  • The prediction process also supports using custom input conformations for predictions.
代码
文本
[7]
clf = MolPredict(load_model='./exp')
代码
文本
[8]
# You can view detailed information about the loaded model using config, or check the corresponding config.yaml file in the specified path
clf.config
{'amp': True,
 'anomaly_clean': True,
 'batch_size': 16,
 'cuda': True,
 'data_type': 'molecule',
 'epochs': 20,
 'kfold': 5,
 'learning_rate': 0.0001,
 'logger_level': 1,
 'max_epochs': 100,
 'max_norm': 5.0,
 'metrics': 'auc',
 'model_name': 'unimolv1',
 'num_classes': 1,
 'patience': 5,
 'remove_hs': True,
 'seed': 42,
 'smi_strict': True,
 'smiles_col': 'SMILES',
 'split': 'random',
 'split_group_col': 'scaffold',
 'split_method': '5fold_random',
 'split_seed': 42,
 'target_col_prefix': 'TARGET',
 'target_cols': ['TARGET'],
 'target_normalize': 'auto',
 'task': 'classification',
 'use_amp': True,
 'use_cuda': True,
 'warmup_ratio': 0.03}
代码
文本
[9]
# Prediction based on SMILES file input mode
test_pred = clf.predict('mol_test.csv')
2024-07-17 17:51:05 | unimol/data/conformer.py | 62 | INFO | Uni-Mol(QSAR) | Start generating conformers...
367it [00:11, 32.00it/s]
2024-07-17 17:51:17 | unimol/data/conformer.py | 66 | INFO | Uni-Mol(QSAR) | Failed to generate conformers for 0.00% of molecules.
2024-07-17 17:51:17 | unimol/data/conformer.py | 68 | INFO | Uni-Mol(QSAR) | Failed to generate 3d conformers for 0.00% of molecules.
2024-07-17 17:51:18 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_no_h_220816.pt
2024-07-17 17:51:18 | unimol/models/nnmodel.py | 154 | INFO | Uni-Mol(QSAR) | start predict NNModel:unimolv1
2024-07-17 17:51:18 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-07-17 17:51:19 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-07-17 17:51:20 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-07-17 17:51:21 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-07-17 17:51:22 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
                                                    
代码
文本
[10]
test_results = pd.DataFrame({'pred':test_pred.flatten(),
'smiles':clf.datahub.data['smiles']
})
print(test_results.head())
       pred                                             smiles
0  0.011400  CC(CCC(=O)O)C1CCC2C3C(CC(=O)C12C)C4(C)CCC(=O)C...
1  0.661172       CC(=O)c1ccc2c(c1)Sc3ccccc3N2CCCN4CCN(CC4)CCO
2  0.030377  CCCN(CCC)C(=O)C(CCC(=O)OCCCN1CCN(CCOC(=O)Cc2c(...
3  0.011402  CC(C)CCCC(C)CCCC(C)CCCC1(C)CCc2c(C)c(O)c(C)c(C...
4  0.682624                       CCCN(CCC)CCc1cccc2c1CC(=O)N2
代码
文本
[11]
# Load custom conformation for training, using fake data as an example
import numpy as np
custom_data ={'target':np.random.randint(2, size=100),
'atoms':[['C','C','H','H','H','H'] for _ in range(100)],
'coordinates':[np.random.randn(6,3) for _ in range(100)],
}

clf_fake = MolTrain(task='classification',
data_type='molecule',
epochs=1,
learning_rate=0.0001,
batch_size=1,
early_stopping=1,
metrics='auc',
split='random',
save_path='./exp_fake',
)
clf_fake.fit(custom_data)
2024-07-17 17:51:23 | unimol/train.py | 102 | INFO | Uni-Mol(QSAR) | Create output directory: ./exp_fake
2024-07-17 17:51:24 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2024-07-17 17:51:25 | unimol/models/nnmodel.py | 103 | INFO | Uni-Mol(QSAR) | start training Uni-Mol:unimolv1
2024-07-17 17:51:34 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/1] train_loss: 1.0921, val_loss: 0.7354, val_auc: 0.4062, lr: 0.000000, 9.4s
2024-07-17 17:51:35 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-07-17 17:51:35 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 0, result {'auc': 0.40625, 'auroc': 0.40625, 'auprc': 0.6417324104630607, 'log_loss': 0.73536017537117, 'acc': 0.6, 'f1_score': 0.7499999999999999, 'mcc': 0.0, 'precision': 0.6, 'recall': 1.0, 'cohen_kappa': 0.0}
2024-07-17 17:51:36 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2024-07-17 17:51:44 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/1] train_loss: 0.8989, val_loss: 0.8147, val_auc: 0.3077, lr: 0.000000, 7.3s
2024-07-17 17:51:44 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-07-17 17:51:44 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 1, result {'auc': 0.3076923076923077, 'auroc': 0.3076923076923077, 'auprc': 0.5976957055011354, 'log_loss': 0.8146905124187469, 'acc': 0.65, 'f1_score': 0.787878787878788, 'mcc': 0.0, 'precision': 0.65, 'recall': 1.0, 'cohen_kappa': 0.0}
2024-07-17 17:51:45 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2024-07-17 17:51:53 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/1] train_loss: 0.9036, val_loss: 1.1799, val_auc: 0.3900, lr: 0.000000, 7.5s
2024-07-17 17:51:53 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-07-17 17:51:53 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 2, result {'auc': 0.39, 'auroc': 0.39, 'auprc': 0.4560698125404008, 'log_loss': 1.1798837244510652, 'acc': 0.5, 'f1_score': 0.6666666666666666, 'mcc': 0.0, 'precision': 0.5, 'recall': 1.0, 'cohen_kappa': 0.0}
2024-07-17 17:51:54 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2024-07-17 17:52:02 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/1] train_loss: 1.0081, val_loss: 0.8340, val_auc: 0.6364, lr: 0.000000, 8.1s
2024-07-17 17:52:03 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-07-17 17:52:03 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 3, result {'auc': 0.6363636363636364, 'auroc': 0.6363636363636364, 'auprc': 0.743450096437009, 'log_loss': 0.8340079627931118, 'acc': 0.55, 'f1_score': 0.7096774193548387, 'mcc': 0.0, 'precision': 0.55, 'recall': 1.0, 'cohen_kappa': 0.0}
2024-07-17 17:52:04 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2024-07-17 17:52:11 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/1] train_loss: 1.0388, val_loss: 1.3238, val_auc: 0.1500, lr: 0.000000, 7.2s
2024-07-17 17:52:12 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-07-17 17:52:12 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 4, result {'auc': 0.15, 'auroc': 0.15, 'auprc': 0.3651321932281684, 'log_loss': 1.32375649176538, 'acc': 0.5, 'f1_score': 0.6666666666666666, 'mcc': 0.0, 'precision': 0.5, 'recall': 1.0, 'cohen_kappa': 0.0}
2024-07-17 17:52:12 | unimol/models/nnmodel.py | 144 | INFO | Uni-Mol(QSAR) | Uni-Mol metrics score: 
{'auc': 0.4330357142857143, 'auroc': 0.4330357142857143, 'auprc': 0.5002003789988358, 'log_loss': 0.9775397733598947, 'acc': 0.56, 'f1_score': 0.717948717948718, 'mcc': 0.0, 'precision': 0.56, 'recall': 1.0, 'cohen_kappa': 0.0}
2024-07-17 17:52:12 | unimol/models/nnmodel.py | 145 | INFO | Uni-Mol(QSAR) | Uni-Mol & Metric result saved!
2024-07-17 17:52:12 | unimol/utils/metrics.py | 260 | INFO | Uni-Mol(QSAR) | metrics for threshold: accuracy_score
2024-07-17 17:52:12 | unimol/utils/metrics.py | 274 | INFO | Uni-Mol(QSAR) | best threshold: 0.7384856939315796, metrics: 0.55
代码
文本
[12]
# Load custom conformation for training, using fake data as an example
import numpy as np
custom_data = {
# 'target':np.random.randint(2, size=100),
'atoms':[['C','C','H','H','H','H'] for _ in range(100)],
'coordinates':[np.random.randn(6,3) for _ in range(100)],
}

clf_fake = MolPredict(load_model = './exp_fake')
fake_predict = clf_fake.predict(custom_data)
2024-07-17 17:52:13 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2024-07-17 17:52:13 | unimol/models/nnmodel.py | 154 | INFO | Uni-Mol(QSAR) | start predict NNModel:unimolv1
2024-07-17 17:52:13 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-07-17 17:52:15 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-07-17 17:52:16 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-07-17 17:52:18 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-07-17 17:52:19 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
                                                     
代码
文本
Uni-Mol
Deep Learning
AI4SCUP-OLED
Uni-MolDeep LearningAI4SCUP-OLED
点个赞吧
推荐阅读
公开
Molecular Property Prediction Based on Uni-Mol
Uni-MolDeep LearningAI4SCUP-OLED
Uni-MolDeep LearningAI4SCUP-OLED
Yani Guan
更新于 2024-10-17
公开
基于Uni-Mol的分子性质预测
Uni-MolDeep LearningAI4SCUP-OLED
Uni-MolDeep LearningAI4SCUP-OLED
Zhifeng Gao
更新于 2024-07-17
7 赞8 转存文件
{/**/}