新建
Molecular Property Prediction Based on Uni-Mol
Yani Guan
推荐镜像 :Uni-Mol:unimol-qsar:v0.5
推荐机型 :c3_m4_1 * NVIDIA T4
赞
目录
Molecular Property Prediction Based on Uni-Mol
- Open source code address: https://github.com/deepmodeling/Uni-Mol
- Paper address: https://openreview.net/forum?id=6K2RM6wVqKu
代码
文本
[1]
# Select the image: unimol-qsar:v0.5, choose the GPU machine type
# Import unimol
from unimol import MolTrain, MolPredict
/opt/conda/lib/python3.8/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
代码
文本
[2]
%%bash
# download sample dataset, CNS drug data
rm -rf mol_test.csv
rm -rf mol_train.csv
wget -nv https://bohrium-example.oss-cn-zhangjiakou.aliyuncs.com/unimol-qsar/mol_test.csv
wget -nv https://bohrium-example.oss-cn-zhangjiakou.aliyuncs.com/unimol-qsar/mol_train.csv
2024-07-17 17:46:18 URL:https://bohrium-example.oss-cn-zhangjiakou.aliyuncs.com/unimol-qsar/mol_test.csv [17486/17486] -> "mol_test.csv" [1] 2024-07-17 17:46:18 URL:https://bohrium-example.oss-cn-zhangjiakou.aliyuncs.com/unimol-qsar/mol_train.csv [30600/30600] -> "mol_train.csv" [1]
代码
文本
[3]
# For a single-task scenario, the input file needs to contain two columns: SMILES and TARGET.
# For a multi-task scenario, the input file needs to contain columns: SMILES and TARGET_XX, where TARGET_XX corresponds to the column(s) that need to be predicted.
!head mol_train.csv
SMILES,TARGET CC1OC(=O)CC(O)CC(O)CC(O)CCC(O)C(O)CC(=O)CC(O)C(C(O)CC(OC2OC(C)C(O)C(N)C2O)C=CC=CC=CC=CCCC=CC=CC(C)C(O)C1C)C(=O)O,0 NCCCCC(NC(CCc1ccccc1)C(=O)O)C(=O)N2CCCC2C(=O)O,0 c1ccc(cc1)C2=NCC(=O)Nc3c2cc(cc3)[N+](=O)[O],1 CCN(CC)C(C)CN1c2ccccc2Sc3c1cccc3,1 CC(CCCC(C)(C)O)C1CCC2C(=CC=C3CC(O)CC(O)C3=C)CCCC12C,0 c1ccc(cc1)C(c2ccc(cc2)Cl)N3CCN(CC3)CCOCCO,1 CCCSc1ccc2[nH]c(NC(=O)OC)nc2c1,0 NC12CC3CC(CC(C3)C1)C2,0 CC1CCc2cc(F)cc3C(=O)C(=CN1c23)C(=O)O,0
代码
文本
UniMol Training Parameter Descriptions
task: Select the corresponding task, currently supporting five types of tasks:
- classification: 0/1 classification
- regression: regression
- multiclass: multi-class classification
- multilabel_classification: multi-label 0/1 classification
- multilabel_regression: multi-label regression
metrics: Metrics that the model should optimize. Multiple metrics can be separated by commas, and if left blank, defaults will be used. Supported metrics are as follows:
- classification: auc, auprc, log_loss, f1_score, mcc, recall, precision, cohen_kappa;
- regression: mae, mse, rmse, r2, spearmanr;
- multiclass: log_loss, acc;
- multilabel_classification: log_loss, acc, auprc, cohen_kappa;
- multilabel_regression: mse, mae, rmse, r2;
data_type: Type of input data. Currently, only molecules are supported, but more data sources such as proteins and crystals will be added in the future;
split: UniMol uses 5-fold cross-validation by default. The split method supports both random splitting and scaffold-based splitting;
save_path: The path for the current task. By default, the files will be overwritten;
epochs, learning_rate, batch_size, early_stopping: Parameters available during UniMol training;
remove_hs: Whether or not to remove hydrogens from molecules. By default, hydrogens are not removed (removing them significantly reduces memory usage).
代码
文本
[4]
### training model initialization
clf = MolTrain(task='classification',
data_type='molecule',
epochs=20,
learning_rate=0.0001,
batch_size=16,
early_stopping=5,
metrics='auc',
split='random',
save_path='./exp',
remove_hs=True,
)
代码
文本
Currently, there are two ways to support model training:
- Provide the corresponding csv file path for the training set, which must contain two columns: SMILES and TARGET;
- Custom conformation method: You need to pass a dictionary that contains atoms and coordinates.
代码
文本
[5]
### Model Training - Using SMILES File Method
clf.fit('mol_train.csv')
2024-07-17 17:46:38 | unimol/data/conformer.py | 62 | INFO | Uni-Mol(QSAR) | Start generating conformers... 700it [00:15, 45.53it/s] 2024-07-17 17:46:53 | unimol/data/conformer.py | 66 | INFO | Uni-Mol(QSAR) | Failed to generate conformers for 0.00% of molecules. 2024-07-17 17:46:53 | unimol/data/conformer.py | 68 | INFO | Uni-Mol(QSAR) | Failed to generate 3d conformers for 0.00% of molecules. 2024-07-17 17:46:53 | unimol/train.py | 102 | INFO | Uni-Mol(QSAR) | Create output directory: ./exp 2024-07-17 17:46:54 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_no_h_220816.pt 2024-07-17 17:46:55 | unimol/models/nnmodel.py | 103 | INFO | Uni-Mol(QSAR) | start training Uni-Mol:unimolv1 2024-07-17 17:47:11 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/20] train_loss: 0.6512, val_loss: 0.4486, val_auc: 0.8720, lr: 0.000098, 11.6s 2024-07-17 17:47:17 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/20] train_loss: 0.4759, val_loss: 0.4393, val_auc: 0.9088, lr: 0.000093, 5.5s 2024-07-17 17:47:22 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/20] train_loss: 0.4232, val_loss: 0.3040, val_auc: 0.9368, lr: 0.000088, 4.5s 2024-07-17 17:47:26 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/20] train_loss: 0.3154, val_loss: 0.2988, val_auc: 0.9322, lr: 0.000082, 4.5s 2024-07-17 17:47:31 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/20] train_loss: 0.2268, val_loss: 0.3729, val_auc: 0.9251, lr: 0.000077, 4.4s 2024-07-17 17:47:35 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/20] train_loss: 0.2273, val_loss: 0.5764, val_auc: 0.9191, lr: 0.000072, 4.4s 2024-07-17 17:47:40 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/20] train_loss: 0.1298, val_loss: 0.4541, val_auc: 0.9310, lr: 0.000067, 4.3s 2024-07-17 17:47:44 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/20] train_loss: 0.0938, val_loss: 0.6080, val_auc: 0.9088, lr: 0.000062, 4.5s 2024-07-17 17:47:44 | unimol/utils/metrics.py | 243 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 8 2024-07-17 17:47:44 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-07-17 17:47:44 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 0, result {'auc': 0.9368421052631579, 'auroc': 0.9368421052631579, 'auprc': 0.8823329417709477, 'log_loss': 0.3080171648279897, 'acc': 0.8785714285714286, 'f1_score': 0.8089887640449439, 'mcc': 0.7200976807742538, 'precision': 0.8181818181818182, 'recall': 0.8, 'cohen_kappa': 0.72} 2024-07-17 17:47:45 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_no_h_220816.pt 2024-07-17 17:47:50 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/20] train_loss: 0.5967, val_loss: 0.4289, val_auc: 0.8760, lr: 0.000098, 4.3s 2024-07-17 17:47:54 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/20] train_loss: 0.4246, val_loss: 0.3442, val_auc: 0.9160, lr: 0.000093, 4.4s 2024-07-17 17:47:59 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/20] train_loss: 0.3502, val_loss: 0.6515, val_auc: 0.9067, lr: 0.000088, 4.3s 2024-07-17 17:48:03 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/20] train_loss: 0.3070, val_loss: 0.3463, val_auc: 0.9333, lr: 0.000082, 4.2s 2024-07-17 17:48:08 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/20] train_loss: 0.2769, val_loss: 0.4444, val_auc: 0.9284, lr: 0.000077, 4.3s 2024-07-17 17:48:12 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/20] train_loss: 0.3168, val_loss: 0.3439, val_auc: 0.9324, lr: 0.000072, 4.2s 2024-07-17 17:48:16 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/20] train_loss: 0.2651, val_loss: 0.7065, val_auc: 0.9080, lr: 0.000067, 4.3s 2024-07-17 17:48:21 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/20] train_loss: 0.2476, val_loss: 0.4660, val_auc: 0.9102, lr: 0.000062, 4.3s 2024-07-17 17:48:26 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/20] train_loss: 0.1318, val_loss: 0.7787, val_auc: 0.9127, lr: 0.000057, 5.0s 2024-07-17 17:48:26 | unimol/utils/metrics.py | 243 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 9 2024-07-17 17:48:26 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-07-17 17:48:26 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 1, result {'auc': 0.9333333333333333, 'auroc': 0.9333333333333333, 'auprc': 0.8867574781831005, 'log_loss': 0.35297432385518085, 'acc': 0.8928571428571429, 'f1_score': 0.8571428571428572, 'mcc': 0.7739827625318121, 'precision': 0.8181818181818182, 'recall': 0.9, 'cohen_kappa': 0.7717391304347826} 2024-07-17 17:48:27 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_no_h_220816.pt 2024-07-17 17:48:32 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/20] train_loss: 0.6556, val_loss: 0.4963, val_auc: 0.8521, lr: 0.000098, 4.4s 2024-07-17 17:48:36 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/20] train_loss: 0.4495, val_loss: 0.3652, val_auc: 0.9090, lr: 0.000093, 4.4s 2024-07-17 17:48:42 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/20] train_loss: 0.3517, val_loss: 0.6135, val_auc: 0.8999, lr: 0.000088, 4.3s 2024-07-17 17:48:47 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/20] train_loss: 0.2684, val_loss: 0.4570, val_auc: 0.9210, lr: 0.000082, 4.5s 2024-07-17 17:48:52 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/20] train_loss: 0.1819, val_loss: 0.4764, val_auc: 0.9103, lr: 0.000077, 5.1s 2024-07-17 17:48:56 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/20] train_loss: 0.1614, val_loss: 0.5266, val_auc: 0.9130, lr: 0.000072, 4.3s 2024-07-17 17:49:01 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/20] train_loss: 0.0810, val_loss: 0.6567, val_auc: 0.9099, lr: 0.000067, 4.4s 2024-07-17 17:49:05 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/20] train_loss: 0.0690, val_loss: 0.6392, val_auc: 0.9035, lr: 0.000062, 4.3s 2024-07-17 17:49:09 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/20] train_loss: 0.0651, val_loss: 0.8091, val_auc: 0.9069, lr: 0.000057, 4.4s 2024-07-17 17:49:09 | unimol/utils/metrics.py | 243 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 9 2024-07-17 17:49:10 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-07-17 17:49:10 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 2, result {'auc': 0.9209692028985508, 'auroc': 0.9209692028985508, 'auprc': 0.881805333111016, 'log_loss': 0.44749031981586346, 'acc': 0.8285714285714286, 'f1_score': 0.7000000000000001, 'mcc': 0.6102458740855465, 'precision': 0.875, 'recall': 0.5833333333333334, 'cohen_kappa': 0.5866141732283465} 2024-07-17 17:49:11 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_no_h_220816.pt 2024-07-17 17:49:15 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/20] train_loss: 0.6369, val_loss: 0.6772, val_auc: 0.7854, lr: 0.000098, 4.4s 2024-07-17 17:49:20 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/20] train_loss: 0.4516, val_loss: 0.4308, val_auc: 0.8713, lr: 0.000093, 4.3s 2024-07-17 17:49:25 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/20] train_loss: 0.3334, val_loss: 0.4203, val_auc: 0.8982, lr: 0.000088, 5.2s 2024-07-17 17:49:31 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/20] train_loss: 0.2943, val_loss: 0.3751, val_auc: 0.9037, lr: 0.000082, 5.1s 2024-07-17 17:49:36 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/20] train_loss: 0.2592, val_loss: 0.4622, val_auc: 0.9086, lr: 0.000077, 4.5s 2024-07-17 17:49:41 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/20] train_loss: 0.2222, val_loss: 0.4553, val_auc: 0.9064, lr: 0.000072, 4.6s 2024-07-17 17:49:45 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/20] train_loss: 0.1986, val_loss: 0.5191, val_auc: 0.9216, lr: 0.000067, 4.6s 2024-07-17 17:49:50 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/20] train_loss: 0.1894, val_loss: 0.6887, val_auc: 0.9068, lr: 0.000062, 4.5s 2024-07-17 17:49:54 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/20] train_loss: 0.1149, val_loss: 0.6962, val_auc: 0.8731, lr: 0.000057, 4.4s 2024-07-17 17:49:59 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/20] train_loss: 0.1123, val_loss: 0.6989, val_auc: 0.8954, lr: 0.000052, 4.4s 2024-07-17 17:50:03 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [11/20] train_loss: 0.0718, val_loss: 1.0569, val_auc: 0.8954, lr: 0.000046, 4.4s 2024-07-17 17:50:08 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [12/20] train_loss: 0.0931, val_loss: 0.9347, val_auc: 0.8894, lr: 0.000041, 4.6s 2024-07-17 17:50:08 | unimol/utils/metrics.py | 243 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 12 2024-07-17 17:50:08 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-07-17 17:50:08 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 3, result {'auc': 0.9215686274509804, 'auroc': 0.9215686274509804, 'auprc': 0.8843896223781123, 'log_loss': 0.5236389451177924, 'acc': 0.8642857142857143, 'f1_score': 0.7912087912087913, 'mcc': 0.704062425880932, 'precision': 0.9, 'recall': 0.7058823529411765, 'cohen_kappa': 0.6928406466512702} 2024-07-17 17:50:09 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_no_h_220816.pt 2024-07-17 17:50:14 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/20] train_loss: 0.7003, val_loss: 0.6448, val_auc: 0.6662, lr: 0.000098, 4.3s 2024-07-17 17:50:18 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/20] train_loss: 0.5611, val_loss: 0.5539, val_auc: 0.8098, lr: 0.000093, 4.3s 2024-07-17 17:50:23 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/20] train_loss: 0.4389, val_loss: 0.4325, val_auc: 0.9169, lr: 0.000088, 4.9s 2024-07-17 17:50:28 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/20] train_loss: 0.4042, val_loss: 0.2911, val_auc: 0.9466, lr: 0.000082, 4.4s 2024-07-17 17:50:33 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/20] train_loss: 0.3121, val_loss: 0.3371, val_auc: 0.9393, lr: 0.000077, 4.4s 2024-07-17 17:50:38 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/20] train_loss: 0.2647, val_loss: 0.3885, val_auc: 0.9313, lr: 0.000072, 4.4s 2024-07-17 17:50:42 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/20] train_loss: 0.1881, val_loss: 0.2983, val_auc: 0.9540, lr: 0.000067, 4.3s 2024-07-17 17:50:47 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/20] train_loss: 0.1685, val_loss: 0.4049, val_auc: 0.9399, lr: 0.000062, 4.5s 2024-07-17 17:50:51 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/20] train_loss: 0.1534, val_loss: 0.6540, val_auc: 0.8705, lr: 0.000057, 4.4s 2024-07-17 17:50:56 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/20] train_loss: 0.1108, val_loss: 0.3943, val_auc: 0.9260, lr: 0.000052, 4.4s 2024-07-17 17:51:00 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [11/20] train_loss: 0.0784, val_loss: 0.5554, val_auc: 0.9109, lr: 0.000046, 4.5s 2024-07-17 17:51:05 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [12/20] train_loss: 0.0742, val_loss: 0.3906, val_auc: 0.9260, lr: 0.000041, 4.4s 2024-07-17 17:51:05 | unimol/utils/metrics.py | 243 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 12 2024-07-17 17:51:05 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-07-17 17:51:05 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 4, result {'auc': 0.9540229885057472, 'auroc': 0.9540229885057472, 'auprc': 0.9278865030074934, 'log_loss': 0.2957740822673908, 'acc': 0.9285714285714286, 'f1_score': 0.9019607843137256, 'mcc': 0.8475280627789067, 'precision': 0.9387755102040817, 'recall': 0.8679245283018868, 'cohen_kappa': 0.8459167950693375} 2024-07-17 17:51:05 | unimol/models/nnmodel.py | 144 | INFO | Uni-Mol(QSAR) | Uni-Mol metrics score: {'auc': 0.9139787829226659, 'auroc': 0.9139787829226659, 'auprc': 0.8727395178107266, 'log_loss': 0.3855789671768434, 'acc': 0.8785714285714286, 'f1_score': 0.8179871520342612, 'mcc': 0.7300828090459858, 'precision': 0.8681818181818182, 'recall': 0.7732793522267206, 'cohen_kappa': 0.7273393822747686} 2024-07-17 17:51:05 | unimol/models/nnmodel.py | 145 | INFO | Uni-Mol(QSAR) | Uni-Mol & Metric result saved! 2024-07-17 17:51:05 | unimol/utils/metrics.py | 260 | INFO | Uni-Mol(QSAR) | metrics for threshold: accuracy_score 2024-07-17 17:51:05 | unimol/utils/metrics.py | 274 | INFO | Uni-Mol(QSAR) | best threshold: 0.367564390756582, metrics: 0.88
代码
文本
[6]
import pandas as pd
from sklearn.metrics import roc_auc_score
cv_results = pd.DataFrame({'pred':clf.cv_pred.flatten(),
'smiles':clf.data['smiles'],
'target':clf.data['target'].flatten()})
print(cv_results.head())
print("Cross-validation result:",roc_auc_score(cv_results.target, cv_results.pred))
pred smiles target 0 0.006084 CC1OC(=O)CC(O)CC(O)CC(O)CCC(O)C(O)CC(=O)CC(O)C... 0 1 0.009041 NCCCCC(NC(CCc1ccccc1)C(=O)O)C(=O)N2CCCC2C(=O)O 0 2 0.891787 c1ccc(cc1)C2=NCC(=O)Nc3c2cc(cc3)[N+](=O)[O] 1 3 0.088636 CCN(CC)C(C)CN1c2ccccc2Sc3c1cccc3 1 4 0.007430 CC(CCCC(C)(C)O)C1CCC2C(=CC=C3CC(O)CC(O)C3=C)CC... 0 交叉验证结果: 0.9139787829226659
代码
文本
unimol Prediction Parameter Description
The first step in the prediction part is to load the trained model, and the second step is to make predictions;
- load_model: Path to the trained model;
- The prediction part also supports custom input conformations for prediction;
代码
文本
[7]
clf = MolPredict(load_model='./exp')
代码
文本
[8]
# You can view detailed information about the loaded model through the config, or you can check the corresponding config.yaml file in the specified path
clf.config
{'amp': True, 'anomaly_clean': True, 'batch_size': 16, 'cuda': True, 'data_type': 'molecule', 'epochs': 20, 'kfold': 5, 'learning_rate': 0.0001, 'logger_level': 1, 'max_epochs': 100, 'max_norm': 5.0, 'metrics': 'auc', 'model_name': 'unimolv1', 'num_classes': 1, 'patience': 5, 'remove_hs': True, 'seed': 42, 'smi_strict': True, 'smiles_col': 'SMILES', 'split': 'random', 'split_group_col': 'scaffold', 'split_method': '5fold_random', 'split_seed': 42, 'target_col_prefix': 'TARGET', 'target_cols': ['TARGET'], 'target_normalize': 'auto', 'task': 'classification', 'use_amp': True, 'use_cuda': True, 'warmup_ratio': 0.03}
代码
文本
[9]
test_pred = clf.predict('mol_test.csv')
2024-07-17 17:51:05 | unimol/data/conformer.py | 62 | INFO | Uni-Mol(QSAR) | Start generating conformers... 367it [00:11, 32.00it/s] 2024-07-17 17:51:17 | unimol/data/conformer.py | 66 | INFO | Uni-Mol(QSAR) | Failed to generate conformers for 0.00% of molecules. 2024-07-17 17:51:17 | unimol/data/conformer.py | 68 | INFO | Uni-Mol(QSAR) | Failed to generate 3d conformers for 0.00% of molecules. 2024-07-17 17:51:18 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_no_h_220816.pt 2024-07-17 17:51:18 | unimol/models/nnmodel.py | 154 | INFO | Uni-Mol(QSAR) | start predict NNModel:unimolv1 2024-07-17 17:51:18 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-07-17 17:51:19 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-07-17 17:51:20 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-07-17 17:51:21 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-07-17 17:51:22 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
代码
文本
[10]
test_results = pd.DataFrame({'pred':test_pred.flatten(),
'smiles':clf.datahub.data['smiles']
})
print(test_results.head())
pred smiles 0 0.011400 CC(CCC(=O)O)C1CCC2C3C(CC(=O)C12C)C4(C)CCC(=O)C... 1 0.661172 CC(=O)c1ccc2c(c1)Sc3ccccc3N2CCCN4CCN(CC4)CCO 2 0.030377 CCCN(CCC)C(=O)C(CCC(=O)OCCCN1CCN(CCOC(=O)Cc2c(... 3 0.011402 CC(C)CCCC(C)CCCC(C)CCCC1(C)CCc2c(C)c(O)c(C)c(C... 4 0.682624 CCCN(CCC)CCc1cccc2c1CC(=O)N2
代码
文本
[11]
# Load custom conformation training, using fake data as an example
import numpy as np
custom_data ={'target':np.random.randint(2, size=100),
'atoms':[['C','C','H','H','H','H'] for _ in range(100)],
'coordinates':[np.random.randn(6,3) for _ in range(100)],
}
clf_fake = MolTrain(task='classification',
data_type='molecule',
epochs=1,
learning_rate=0.0001,
batch_size=1,
early_stopping=1,
metrics='auc',
split='random',
save_path='./exp_fake',
)
clf_fake.fit(custom_data)
2024-07-17 17:51:23 | unimol/train.py | 102 | INFO | Uni-Mol(QSAR) | Create output directory: ./exp_fake 2024-07-17 17:51:24 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2024-07-17 17:51:25 | unimol/models/nnmodel.py | 103 | INFO | Uni-Mol(QSAR) | start training Uni-Mol:unimolv1 2024-07-17 17:51:34 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/1] train_loss: 1.0921, val_loss: 0.7354, val_auc: 0.4062, lr: 0.000000, 9.4s 2024-07-17 17:51:35 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-07-17 17:51:35 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 0, result {'auc': 0.40625, 'auroc': 0.40625, 'auprc': 0.6417324104630607, 'log_loss': 0.73536017537117, 'acc': 0.6, 'f1_score': 0.7499999999999999, 'mcc': 0.0, 'precision': 0.6, 'recall': 1.0, 'cohen_kappa': 0.0} 2024-07-17 17:51:36 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2024-07-17 17:51:44 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/1] train_loss: 0.8989, val_loss: 0.8147, val_auc: 0.3077, lr: 0.000000, 7.3s 2024-07-17 17:51:44 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-07-17 17:51:44 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 1, result {'auc': 0.3076923076923077, 'auroc': 0.3076923076923077, 'auprc': 0.5976957055011354, 'log_loss': 0.8146905124187469, 'acc': 0.65, 'f1_score': 0.787878787878788, 'mcc': 0.0, 'precision': 0.65, 'recall': 1.0, 'cohen_kappa': 0.0} 2024-07-17 17:51:45 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2024-07-17 17:51:53 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/1] train_loss: 0.9036, val_loss: 1.1799, val_auc: 0.3900, lr: 0.000000, 7.5s 2024-07-17 17:51:53 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-07-17 17:51:53 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 2, result {'auc': 0.39, 'auroc': 0.39, 'auprc': 0.4560698125404008, 'log_loss': 1.1798837244510652, 'acc': 0.5, 'f1_score': 0.6666666666666666, 'mcc': 0.0, 'precision': 0.5, 'recall': 1.0, 'cohen_kappa': 0.0} 2024-07-17 17:51:54 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2024-07-17 17:52:02 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/1] train_loss: 1.0081, val_loss: 0.8340, val_auc: 0.6364, lr: 0.000000, 8.1s 2024-07-17 17:52:03 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-07-17 17:52:03 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 3, result {'auc': 0.6363636363636364, 'auroc': 0.6363636363636364, 'auprc': 0.743450096437009, 'log_loss': 0.8340079627931118, 'acc': 0.55, 'f1_score': 0.7096774193548387, 'mcc': 0.0, 'precision': 0.55, 'recall': 1.0, 'cohen_kappa': 0.0} 2024-07-17 17:52:04 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2024-07-17 17:52:11 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/1] train_loss: 1.0388, val_loss: 1.3238, val_auc: 0.1500, lr: 0.000000, 7.2s 2024-07-17 17:52:12 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-07-17 17:52:12 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 4, result {'auc': 0.15, 'auroc': 0.15, 'auprc': 0.3651321932281684, 'log_loss': 1.32375649176538, 'acc': 0.5, 'f1_score': 0.6666666666666666, 'mcc': 0.0, 'precision': 0.5, 'recall': 1.0, 'cohen_kappa': 0.0} 2024-07-17 17:52:12 | unimol/models/nnmodel.py | 144 | INFO | Uni-Mol(QSAR) | Uni-Mol metrics score: {'auc': 0.4330357142857143, 'auroc': 0.4330357142857143, 'auprc': 0.5002003789988358, 'log_loss': 0.9775397733598947, 'acc': 0.56, 'f1_score': 0.717948717948718, 'mcc': 0.0, 'precision': 0.56, 'recall': 1.0, 'cohen_kappa': 0.0} 2024-07-17 17:52:12 | unimol/models/nnmodel.py | 145 | INFO | Uni-Mol(QSAR) | Uni-Mol & Metric result saved! 2024-07-17 17:52:12 | unimol/utils/metrics.py | 260 | INFO | Uni-Mol(QSAR) | metrics for threshold: accuracy_score 2024-07-17 17:52:12 | unimol/utils/metrics.py | 274 | INFO | Uni-Mol(QSAR) | best threshold: 0.7384856939315796, metrics: 0.55
代码
文本
[12]
# Load custom conformation training, using fake data as an example
import numpy as np
custom_data = {
# 'target':np.random.randint(2, size=100),
'atoms':[['C','C','H','H','H','H'] for _ in range(100)],
'coordinates':[np.random.randn(6,3) for _ in range(100)],
}
clf_fake = MolPredict(load_model = './exp_fake')
fake_predict = clf_fake.predict(custom_data)
2024-07-17 17:52:13 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2024-07-17 17:52:13 | unimol/models/nnmodel.py | 154 | INFO | Uni-Mol(QSAR) | start predict NNModel:unimolv1 2024-07-17 17:52:13 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-07-17 17:52:15 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-07-17 17:52:16 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-07-17 17:52:18 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-07-17 17:52:19 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
代码
文本
点个赞吧
推荐阅读
公开
基于Uni-Mol的分子性质预测Zhifeng Gao
更新于 2024-07-17
7 赞8 转存文件
公开
基于Uni-Mol的分子对接zhougm@dp.tech
发布于 2023-08-31
2 转存文件