新建
Uni-Mol Property Prediction Practice - Regression Task - Melting Point Prediction of Organic/Elect

Yani Guan

推荐镜像 :Uni-Mol:unimol-qsar:v0.5
推荐机型 :c3_m4_1 * NVIDIA T4
赞
目录
Uni-Mol Property Prediction Practice - Regression Task - Melting Point Prediction of Organic/Electrolyte Molecules
©️ Copyright 2023 @ Authors
Authors:
Boshen Zeng 📨
Hongshuai Wang 📨
Date: 2023-07-03
License Agreement:This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Quick Start: Click the Start Connection button above, select the unimol-qsar:v0.4 image and any GPU node configuration, and wait a moment to run.
代码
文本
Case Background
- Melting point is used to describe the temperature required for a substance to transition from a solid state to a liquid state. Typically, under constant pressure, when a substance is heated and transitions from a solid to a liquid, the temperature of the substance does not rise until all the solid has turned into liquid, after which the temperature will continue to rise.
- In the battery field, the melting point of electrolyte molecules is an important physical quantity to measure their stability and usable temperature range. Excellent electrolyte materials are required to meet a wide liquid range, and different application scenarios require the selection of electrolytes with appropriate melting points to meet specific performance requirements.
- Predicting the melting point of unknown molecules will help us reverse-screen materials that can be used as electrolytes from the possible chemical space.
代码
文本
Step1: Read Data
- Contains nearly 20,000 molecules' SMILES notation and melting point measurement data (TARGET)
- TARGET is a continuous value (unit: Celsius)
代码
文本
[1]
!wget -P ./data/ https://dp-public.oss-cn-beijing.aliyuncs.com/community/mp_test.csv
!wget -P ./data/ https://dp-public.oss-cn-beijing.aliyuncs.com/community/mp_train.csv
--2023-10-27 15:56:22-- https://dp-public.oss-cn-beijing.aliyuncs.com/community/mp_test.csv Resolving ga.dp.tech (ga.dp.tech)... 10.255.254.18, 10.255.254.7, 10.255.254.37 Connecting to ga.dp.tech (ga.dp.tech)|10.255.254.18|:8118... connected. Proxy request sent, awaiting response... 200 OK Length: 69686 (68K) [text/csv] ./data/mp_test.csv.1: Read-only file system Cannot write to ‘./data/mp_test.csv.1’ (Success). --2023-10-27 15:56:23-- https://dp-public.oss-cn-beijing.aliyuncs.com/community/mp_train.csv Resolving ga.dp.tech (ga.dp.tech)... 10.255.254.7, 10.255.254.18, 10.255.254.37 Connecting to ga.dp.tech (ga.dp.tech)|10.255.254.7|:8118... connected. Proxy request sent, awaiting response... 200 OK Length: 626850 (612K) [text/csv] ./data/mp_train.csv.1: Read-only file system Cannot write to ‘./data/mp_train.csv.1’ (Success).
代码
文本
Step2: Sampling Data
- Due to the large size of our melting point dataset, it is difficult to complete the demonstration in a short time. Here, we randomly sample 10% for training and testing respectively.
- If interested, you can use the complete training set and test set for better prediction results.
代码
文本
[2]
import pandas as pd
# Load the complete training dataset
train_data_total = pd.read_csv('./data/mp_train.csv')
# Randomly sample 10% of the data for training
train_data = train_data_total.sample(frac=0.1, random_state=1)
print("------------ Sampled Train Data ------------") # Display the sampled training data
print(train_data)
# Rename the columns to "SMILES" and "TARGET"
train_data.columns = ["SMILES", "TARGET"]
# Save the randomly sampled dataset
train_data.to_csv('./data/mp_train_0.1.csv')
print('\n')
# Load the complete test dataset
test_data_total = pd.read_csv('./data/mp_test.csv')
# Randomly sample 10% of the test data
test_data = test_data_total.sample(frac=0.1, random_state=1)
print("------------ Sampled Test Data ------------") # Display the sampled test data
print(test_data)
# Rename the columns to "SMILES" and "TARGET"
test_data.columns = ["SMILES", "TARGET"]
# Save the randomly sampled test dataset
test_data.to_csv('./data/mp_test_0.1.csv')
------------ Sampled Train Data ------------ SMILES TARGET 17446 Cc1cc(C(O)=NCC(=N)O)c(C)n1-c1ccc(F)cc1 194.0 15336 Clc1ccccc1C(c1ccccc1)(c1ccccc1)n1ccnc1 147.5 16009 COc1cc(CO)c([N+](=O)[O-])cc1OC 146.0 1610 S=C=Nc1c(Cl)cccc1Cl 43.0 9193 O=C(c1cccs1)c1cccs1 89.5 ... ... ... 2528 COS(=O)(=O)c1ccccc1 -4.0 384 Nc1nc(N)c2nc(-c3ccccc3)c(N)nc2n1 316.0 2748 OCC1OC(OC2(CO)OC(CO)C(O)C2O)C(O)C(O)C1O 185.5 15312 COc1ccc(C2(C)C=C(C)N=N2)cc1 70.0 941 Nc1ccc(O)c(C(=O)O)c1 283.5 [1762 rows x 2 columns] ------------ Sampled Test Data ------------ SMILES TARGET 900 COc1cc2c(cc1O)CCNC2C 221.5 1279 FC(F)(F)c1cccc(I)c1 -8.0 1929 c1ccc(N=Nc2ccc(N=Nc3ccc(N=Nc4ccc(N=Nc5ccccc5)c... 275.0 953 COc1ccc2c(c1)N(CCCN(C)C)c1ccccc1S2 46.0 705 CCc1c(C)ccc2c1CCC1C2CCC2CC(O)CCC21C 172.0 ... ... ... 1227 Cl[Si](Cl)(Cl)c1ccccc1 -127.0 37 C#CCCCCCCC(=O)O 19.0 816 COC(=O)c1cc(C)ccc1C1=NC(C)(C(C)C)C(=O)N1 133.0 1388 NCc1ccco1 -70.0 1591 O=C(O)c1cc(I)ccc1Cl 159.0 [196 rows x 2 columns]
代码
文本
Step 3: Dataset Distribution Visualization
代码
文本
[3]
import matplotlib.pyplot as plt
bins = 30
plt.figure(figsize=(8, 6))
plt.hist(train_data["TARGET"],label="Train Data")
plt.hist(test_data["TARGET"],label="Test Data")
plt.ylabel("Count")
plt.xlabel("Melting Point (℃)")
plt.title("Distribution")
plt.legend(prop={'size': 12})
plt.tick_params(labelsize=14)
plt.tight_layout()
plt.savefig('./data/dataset_distribution_histogram.png',
format='png')
<Figure size 576x432 with 1 Axes>
代码
文本
Step4: Train the Model
- Use the uni-mol tool to train the model on the data
代码
文本
[4]
from unimol import MolTrain, MolPredict
import numpy as np
# Initialize a model for molecular regression task
clf = MolTrain(task='regression', # Regression task
data_type='molecule', # Data type: molecule
epochs=20, # Number of iterations, representing how many times the model goes through the entire training dataset.
# In each epoch, the model updates its parameters based on the training data to reduce prediction errors.
learning_rate=0.0001, # Learning rate
batch_size=16, # Batch size
early_stopping=5, # Stop early if performance doesn't improve for 5 epochs
metrics='r2', # Evaluation metric: R-squared
split='random', # Random data split
save_path='./data/mp_train', # Model save path
)
# Train the model using the training dataset
clf.fit('./data/mp_train_0.1.csv') # Training dataset file
# Load the trained model for prediction
clf = MolPredict(load_model='./data/mp_train') # Load the trained model
/opt/conda/lib/python3.8/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm 2023-10-27 11:13:06 | unimol/data/datareader.py | 139 | INFO | Uni-Mol(QSAR) | Anomaly clean with 3 sigma threshold: 1762 -> 1759 2023-10-27 11:13:07 | unimol/data/conformer.py | 62 | INFO | Uni-Mol(QSAR) | Start generating conformers... 1759it [00:09, 180.91it/s] 2023-10-27 11:13:17 | unimol/data/conformer.py | 66 | INFO | Uni-Mol(QSAR) | Failed to generate conformers for 0.00% of molecules. 2023-10-27 11:13:17 | unimol/data/conformer.py | 68 | INFO | Uni-Mol(QSAR) | Failed to generate 3d conformers for 0.23% of molecules. 2023-10-27 11:13:17 | unimol/train.py | 88 | INFO | Uni-Mol(QSAR) | Output directory already exists: ./data/mp_train 2023-10-27 11:13:17 | unimol/train.py | 89 | INFO | Uni-Mol(QSAR) | Warning: Overwrite output directory: ./data/mp_train 2023-10-27 11:13:17 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2023-10-27 11:13:18 | unimol/models/nnmodel.py | 103 | INFO | Uni-Mol(QSAR) | start training Uni-Mol:unimolv1 2023-10-27 11:13:35 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/20] train_loss: 0.7981, val_loss: 0.8174, val_r2: 0.1709, lr: 0.000098, 14.2s 2023-10-27 11:13:43 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/20] train_loss: 0.6650, val_loss: 0.8618, val_r2: 0.1258, lr: 0.000093, 7.2s 2023-10-27 11:13:50 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/20] train_loss: 0.4852, val_loss: 0.3405, val_r2: 0.6547, lr: 0.000088, 7.1s 2023-10-27 11:13:58 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/20] train_loss: 0.3362, val_loss: 0.2807, val_r2: 0.7152, lr: 0.000082, 7.3s 2023-10-27 11:14:06 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/20] train_loss: 0.3118, val_loss: 0.3977, val_r2: 0.5966, lr: 0.000077, 7.4s 2023-10-27 11:14:13 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/20] train_loss: 0.3613, val_loss: 0.3394, val_r2: 0.6557, lr: 0.000072, 7.2s 2023-10-27 11:14:21 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/20] train_loss: 0.2672, val_loss: 0.2824, val_r2: 0.7136, lr: 0.000067, 7.3s 2023-10-27 11:14:28 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/20] train_loss: 0.2404, val_loss: 0.3605, val_r2: 0.6343, lr: 0.000062, 7.2s 2023-10-27 11:14:35 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/20] train_loss: 0.2012, val_loss: 0.3286, val_r2: 0.6667, lr: 0.000057, 7.2s 2023-10-27 11:14:35 | unimol/utils/metrics.py | 243 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 9 2023-10-27 11:14:36 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2023-10-27 11:14:37 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 0, result {'r2': 0.7152259508146563, 'mae': 36.378147, 'pearsonr': 0.8621145073463755, 'spearmanr': 0.8670408766295951, 'mse': 2259.5159} 2023-10-27 11:14:38 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2023-10-27 11:14:45 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/20] train_loss: 0.8832, val_loss: 0.4824, val_r2: 0.5187, lr: 0.000098, 7.6s 2023-10-27 11:14:53 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/20] train_loss: 0.4846, val_loss: 0.3974, val_r2: 0.6035, lr: 0.000093, 7.3s 2023-10-27 11:15:01 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/20] train_loss: 0.3652, val_loss: 0.3183, val_r2: 0.6825, lr: 0.000088, 7.2s 2023-10-27 11:15:09 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/20] train_loss: 0.3275, val_loss: 0.4822, val_r2: 0.5189, lr: 0.000082, 7.2s 2023-10-27 11:15:16 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/20] train_loss: 0.2670, val_loss: 0.3954, val_r2: 0.6055, lr: 0.000077, 7.2s 2023-10-27 11:15:23 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/20] train_loss: 0.2787, val_loss: 0.3295, val_r2: 0.6713, lr: 0.000072, 7.2s 2023-10-27 11:15:31 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/20] train_loss: 0.2667, val_loss: 0.3637, val_r2: 0.6372, lr: 0.000067, 7.2s 2023-10-27 11:15:38 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/20] train_loss: 0.1794, val_loss: 0.3376, val_r2: 0.6632, lr: 0.000062, 7.2s 2023-10-27 11:15:38 | unimol/utils/metrics.py | 243 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 8 2023-10-27 11:15:38 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2023-10-27 11:15:38 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 1, result {'r2': 0.6824719059028133, 'mae': 37.94684, 'pearsonr': 0.8295323942266773, 'spearmanr': 0.8361638357549785, 'mse': 2561.4954} 2023-10-27 11:15:39 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2023-10-27 11:15:47 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/20] train_loss: 0.6926, val_loss: 0.5382, val_r2: 0.5026, lr: 0.000098, 7.2s 2023-10-27 11:15:54 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/20] train_loss: 0.5031, val_loss: 0.6842, val_r2: 0.3677, lr: 0.000093, 7.3s 2023-10-27 11:16:02 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/20] train_loss: 0.4098, val_loss: 0.4996, val_r2: 0.5382, lr: 0.000088, 7.2s 2023-10-27 11:16:10 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/20] train_loss: 0.3782, val_loss: 0.3019, val_r2: 0.7210, lr: 0.000082, 7.3s 2023-10-27 11:16:18 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/20] train_loss: 0.2577, val_loss: 0.3733, val_r2: 0.6550, lr: 0.000077, 7.5s 2023-10-27 11:16:25 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/20] train_loss: 0.3145, val_loss: 0.4655, val_r2: 0.5698, lr: 0.000072, 7.4s 2023-10-27 11:16:32 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/20] train_loss: 0.2373, val_loss: 0.3799, val_r2: 0.6489, lr: 0.000067, 7.2s 2023-10-27 11:16:40 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/20] train_loss: 0.2243, val_loss: 0.3168, val_r2: 0.7073, lr: 0.000062, 7.2s 2023-10-27 11:16:47 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/20] train_loss: 0.1629, val_loss: 0.3071, val_r2: 0.7162, lr: 0.000057, 7.4s 2023-10-27 11:16:47 | unimol/utils/metrics.py | 243 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 9 2023-10-27 11:16:48 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2023-10-27 11:16:49 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 2, result {'r2': 0.7209541654983871, 'mae': 37.152107, 'pearsonr': 0.8522750803974984, 'spearmanr': 0.8455151298262741, 'mse': 2430.106} 2023-10-27 11:16:49 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2023-10-27 11:16:57 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/20] train_loss: 0.7350, val_loss: 0.4964, val_r2: 0.4533, lr: 0.000098, 7.4s 2023-10-27 11:17:05 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/20] train_loss: 0.4795, val_loss: 0.3734, val_r2: 0.5887, lr: 0.000093, 7.7s 2023-10-27 11:17:13 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/20] train_loss: 0.3753, val_loss: 0.3127, val_r2: 0.6556, lr: 0.000088, 7.5s 2023-10-27 11:17:22 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/20] train_loss: 0.3156, val_loss: 0.3795, val_r2: 0.5821, lr: 0.000082, 7.5s 2023-10-27 11:17:29 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/20] train_loss: 0.3080, val_loss: 0.2998, val_r2: 0.6698, lr: 0.000077, 7.4s 2023-10-27 11:17:37 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/20] train_loss: 0.2515, val_loss: 0.2871, val_r2: 0.6838, lr: 0.000072, 7.3s 2023-10-27 11:17:45 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/20] train_loss: 0.2196, val_loss: 0.4742, val_r2: 0.4778, lr: 0.000067, 7.3s 2023-10-27 11:17:52 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/20] train_loss: 0.1741, val_loss: 0.3376, val_r2: 0.6282, lr: 0.000062, 7.3s 2023-10-27 11:17:59 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/20] train_loss: 0.1497, val_loss: 0.2989, val_r2: 0.6708, lr: 0.000057, 7.4s 2023-10-27 11:18:07 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/20] train_loss: 0.1360, val_loss: 0.3531, val_r2: 0.6111, lr: 0.000052, 7.6s 2023-10-27 11:18:15 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [11/20] train_loss: 0.1216, val_loss: 0.3443, val_r2: 0.6209, lr: 0.000046, 7.4s 2023-10-27 11:18:15 | unimol/utils/metrics.py | 243 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 11 2023-10-27 11:18:16 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2023-10-27 11:18:16 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 3, result {'r2': 0.6837793590542565, 'mae': 35.573936, 'pearsonr': 0.8339808693169373, 'spearmanr': 0.8304578095495146, 'mse': 2310.9255} 2023-10-27 11:18:17 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2023-10-27 11:18:24 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/20] train_loss: 0.7210, val_loss: 0.3958, val_r2: 0.6082, lr: 0.000098, 7.3s 2023-10-27 11:18:33 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/20] train_loss: 0.4187, val_loss: 0.4111, val_r2: 0.5936, lr: 0.000093, 7.4s 2023-10-27 11:18:40 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/20] train_loss: 0.3838, val_loss: 0.3137, val_r2: 0.6896, lr: 0.000088, 7.4s 2023-10-27 11:18:48 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/20] train_loss: 0.3528, val_loss: 0.3431, val_r2: 0.6603, lr: 0.000082, 7.3s 2023-10-27 11:18:55 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/20] train_loss: 0.2693, val_loss: 0.2732, val_r2: 0.7296, lr: 0.000077, 7.6s 2023-10-27 11:19:04 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/20] train_loss: 0.2528, val_loss: 0.3643, val_r2: 0.6398, lr: 0.000072, 7.5s 2023-10-27 11:19:11 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/20] train_loss: 0.2164, val_loss: 0.2591, val_r2: 0.7436, lr: 0.000067, 7.4s 2023-10-27 11:19:20 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/20] train_loss: 0.1970, val_loss: 0.2915, val_r2: 0.7117, lr: 0.000062, 7.6s 2023-10-27 11:19:27 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/20] train_loss: 0.1452, val_loss: 0.2806, val_r2: 0.7222, lr: 0.000057, 7.7s 2023-10-27 11:19:35 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/20] train_loss: 0.1310, val_loss: 0.3252, val_r2: 0.6778, lr: 0.000052, 7.4s 2023-10-27 11:19:42 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [11/20] train_loss: 0.1214, val_loss: 0.3129, val_r2: 0.6899, lr: 0.000046, 7.5s 2023-10-27 11:19:50 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [12/20] train_loss: 0.1009, val_loss: 0.3293, val_r2: 0.6739, lr: 0.000041, 7.5s 2023-10-27 11:19:50 | unimol/utils/metrics.py | 243 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 12 2023-10-27 11:19:51 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2023-10-27 11:19:52 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 4, result {'r2': 0.7435969685025966, 'mae': 34.783394, 'pearsonr': 0.8652752320161483, 'spearmanr': 0.8533703891931845, 'mse': 2084.7852} 2023-10-27 11:19:52 | unimol/models/nnmodel.py | 144 | INFO | Uni-Mol(QSAR) | Uni-Mol metrics score: {'r2': 0.7105567561951571, 'mae': 36.36778393940169, 'pearsonr': 0.843600909421961, 'spearmanr': 0.8417814272843047, 'mse': 2329.504593731551} 2023-10-27 11:19:52 | unimol/models/nnmodel.py | 145 | INFO | Uni-Mol(QSAR) | Uni-Mol & Metric result saved!
代码
文本
Step5: Predict Melting Point
代码
文本
[5]
# Visualize the model's training results by plotting experimental vs. predicted values.
# Compare the experimental values and predicted values from the test set.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from unimol import MolPredict
# Load the trained model
clf = MolPredict(load_model='./data/mp_train')
# Predict using the test dataset
predict = clf.predict('./data/mp_test_0.1.csv').reshape(-1)
# Read the experimental data file
test_set = pd.read_csv("./data/mp_test_0.1.csv", header='infer')
# Extract the experimental "TARGET" values (melting points)
test_mp = test_set["TARGET"].to_numpy()
# Calculate the range for predicted and experimental values to set axis limits for the plot
xmin = min(predict.flatten().min(), test_mp.min())
xmax = max(predict.flatten().max(), test_mp.max())
ymin = xmin
ymax = xmax
# Set the size of the plot
plt.figure(figsize=(8, 8))
# Change the plot style
plt.style.use('seaborn-darkgrid')
# Set the x-axis and y-axis range
plt.xlim(xmin, xmax)
plt.ylim(ymin, ymax)
# Set the x-axis label to "Predicted Melting Point" and adjust the font size
plt.xlabel('Predicted Melting Point', fontsize=14)
# Set the y-axis label to "Experimental Melting Point" and adjust the font size
plt.ylabel('Experimental Melting Point', fontsize=14)
# Add a title "Experimental vs Predicted Melting Point" and adjust the font size
plt.title('Experimental vs Predicted Melting Point', fontsize=16)
# Plot the experimental vs predicted values using a scatter plot
# Set color, transparency, and labels
plt.scatter(predict.flatten(), test_mp, color='blue', alpha=0.6)
# Generate a line for y = x, indicating perfect predictions
x = np.linspace(*plt.xlim())
plt.plot(x, x, color='red', linestyle='--', linewidth=2) # Plot the y = x line to show where perfect predictions would lie
# Display the plot
plt.show()
<Figure size 576x576 with 1 Axes>
代码
文本
Practice
You can try running another round with the entire training set and test set to compare the prediction results of the two times.
代码
文本
点个赞吧
推荐阅读
公开
Uni-Mol Property Prediction Practice - Regression Task - Dielectric Constant of Electrolyte Molecule
Yani Guan

更新于 2024-09-17
公开
Uni-Mol性质预测实战-回归任务-电解液分子的介电常数
zhengh@dp.tech

发布于 2023-06-12
5 赞22 转存文件