Bohrium
robot
新建

空间站广场

论文
Notebooks
比赛
课程
Apps
我的主页
我的Notebooks
我的论文库
我的足迹

我的工作空间

任务
节点
文件
数据集
镜像
项目
数据库
公开
Uni-Mol Property Prediction Practice - Regression Task - Melting Point Prediction of Organic/Elect
Uni-Mol
Deep Learning
Uni-MolDeep Learning
Yani Guan
更新于 2024-10-17
推荐镜像 :Uni-Mol:unimol-qsar:v0.5
推荐机型 :c3_m4_1 * NVIDIA T4
Uni-Mol Property Prediction Practice - Regression Task - Melting Point Prediction of Organic/Electrolyte Molecules
Case Background
Step1: Read Data
Step2: Sampling Data
Step 3: Dataset Distribution Visualization
Step4: Train the Model
Step5: Predict Melting Point
Practice

Uni-Mol Property Prediction Practice - Regression Task - Melting Point Prediction of Organic/Electrolyte Molecules

©️ Copyright 2023 @ Authors
Authors: Boshen Zeng 📨 Hongshuai Wang 📨
Date: 2023-07-03
License Agreement:This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Quick Start: Click the Start Connection button above, select the unimol-qsar:v0.4 image and any GPU node configuration, and wait a moment to run.

代码
文本

Case Background

  • Melting point is used to describe the temperature required for a substance to transition from a solid state to a liquid state. Typically, under constant pressure, when a substance is heated and transitions from a solid to a liquid, the temperature of the substance does not rise until all the solid has turned into liquid, after which the temperature will continue to rise.
  • In the battery field, the melting point of electrolyte molecules is an important physical quantity to measure their stability and usable temperature range. Excellent electrolyte materials are required to meet a wide liquid range, and different application scenarios require the selection of electrolytes with appropriate melting points to meet specific performance requirements.
  • Predicting the melting point of unknown molecules will help us reverse-screen materials that can be used as electrolytes from the possible chemical space.
代码
文本

Step1: Read Data

  • Contains nearly 20,000 molecules' SMILES notation and melting point measurement data (TARGET)
  • TARGET is a continuous value (unit: Celsius)
代码
文本
[1]
!wget -P ./data/ https://dp-public.oss-cn-beijing.aliyuncs.com/community/mp_test.csv
!wget -P ./data/ https://dp-public.oss-cn-beijing.aliyuncs.com/community/mp_train.csv
--2023-10-27 15:56:22--  https://dp-public.oss-cn-beijing.aliyuncs.com/community/mp_test.csv
Resolving ga.dp.tech (ga.dp.tech)... 10.255.254.18, 10.255.254.7, 10.255.254.37
Connecting to ga.dp.tech (ga.dp.tech)|10.255.254.18|:8118... connected.
Proxy request sent, awaiting response... 200 OK
Length: 69686 (68K) [text/csv]
./data/mp_test.csv.1: Read-only file system

Cannot write to ‘./data/mp_test.csv.1’ (Success).
--2023-10-27 15:56:23--  https://dp-public.oss-cn-beijing.aliyuncs.com/community/mp_train.csv
Resolving ga.dp.tech (ga.dp.tech)... 10.255.254.7, 10.255.254.18, 10.255.254.37
Connecting to ga.dp.tech (ga.dp.tech)|10.255.254.7|:8118... connected.
Proxy request sent, awaiting response... 200 OK
Length: 626850 (612K) [text/csv]
./data/mp_train.csv.1: Read-only file system

Cannot write to ‘./data/mp_train.csv.1’ (Success).
代码
文本

Step2: Sampling Data

  • Due to the large size of our melting point dataset, it is difficult to complete the demonstration in a short time. Here, we randomly sample 10% for training and testing respectively.
  • If interested, you can use the complete training set and test set for better prediction results.
代码
文本
[2]
import pandas as pd

# Load the complete training dataset
train_data_total = pd.read_csv('./data/mp_train.csv')

# Randomly sample 10% of the data for training
train_data = train_data_total.sample(frac=0.1, random_state=1)
print("------------ Sampled Train Data ------------") # Display the sampled training data
print(train_data)

# Rename the columns to "SMILES" and "TARGET"
train_data.columns = ["SMILES", "TARGET"]

# Save the randomly sampled dataset
train_data.to_csv('./data/mp_train_0.1.csv')
print('\n')

# Load the complete test dataset
test_data_total = pd.read_csv('./data/mp_test.csv')

# Randomly sample 10% of the test data
test_data = test_data_total.sample(frac=0.1, random_state=1)
print("------------ Sampled Test Data ------------") # Display the sampled test data
print(test_data)

# Rename the columns to "SMILES" and "TARGET"
test_data.columns = ["SMILES", "TARGET"]

# Save the randomly sampled test dataset
test_data.to_csv('./data/mp_test_0.1.csv')

------------ Sampled Train Data ------------
                                        SMILES  TARGET
17446   Cc1cc(C(O)=NCC(=N)O)c(C)n1-c1ccc(F)cc1   194.0
15336   Clc1ccccc1C(c1ccccc1)(c1ccccc1)n1ccnc1   147.5
16009           COc1cc(CO)c([N+](=O)[O-])cc1OC   146.0
1610                       S=C=Nc1c(Cl)cccc1Cl    43.0
9193                       O=C(c1cccs1)c1cccs1    89.5
...                                        ...     ...
2528                       COS(=O)(=O)c1ccccc1    -4.0
384           Nc1nc(N)c2nc(-c3ccccc3)c(N)nc2n1   316.0
2748   OCC1OC(OC2(CO)OC(CO)C(O)C2O)C(O)C(O)C1O   185.5
15312              COc1ccc(C2(C)C=C(C)N=N2)cc1    70.0
941                       Nc1ccc(O)c(C(=O)O)c1   283.5

[1762 rows x 2 columns]


------------ Sampled Test Data ------------
                                                 SMILES  TARGET
900                                COc1cc2c(cc1O)CCNC2C   221.5
1279                                FC(F)(F)c1cccc(I)c1    -8.0
1929  c1ccc(N=Nc2ccc(N=Nc3ccc(N=Nc4ccc(N=Nc5ccccc5)c...   275.0
953                  COc1ccc2c(c1)N(CCCN(C)C)c1ccccc1S2    46.0
705                 CCc1c(C)ccc2c1CCC1C2CCC2CC(O)CCC21C   172.0
...                                                 ...     ...
1227                             Cl[Si](Cl)(Cl)c1ccccc1  -127.0
37                                      C#CCCCCCCC(=O)O    19.0
816            COC(=O)c1cc(C)ccc1C1=NC(C)(C(C)C)C(=O)N1   133.0
1388                                          NCc1ccco1   -70.0
1591                                O=C(O)c1cc(I)ccc1Cl   159.0

[196 rows x 2 columns]
代码
文本

Step 3: Dataset Distribution Visualization

代码
文本
[3]
import matplotlib.pyplot as plt

bins = 30
plt.figure(figsize=(8, 6))
plt.hist(train_data["TARGET"],label="Train Data")
plt.hist(test_data["TARGET"],label="Test Data")

plt.ylabel("Count")
plt.xlabel("Melting Point (℃)")
plt.title("Distribution")
plt.legend(prop={'size': 12})
plt.tick_params(labelsize=14)
plt.tight_layout()

plt.savefig('./data/dataset_distribution_histogram.png',
format='png')
<Figure size 576x432 with 1 Axes>
代码
文本

Step4: Train the Model

  • Use the uni-mol tool to train the model on the data
代码
文本
[4]
from unimol import MolTrain, MolPredict
import numpy as np

# Initialize a model for molecular regression task
clf = MolTrain(task='regression', # Regression task
data_type='molecule', # Data type: molecule
epochs=20, # Number of iterations, representing how many times the model goes through the entire training dataset.
# In each epoch, the model updates its parameters based on the training data to reduce prediction errors.
learning_rate=0.0001, # Learning rate
batch_size=16, # Batch size
early_stopping=5, # Stop early if performance doesn't improve for 5 epochs
metrics='r2', # Evaluation metric: R-squared
split='random', # Random data split
save_path='./data/mp_train', # Model save path
)

# Train the model using the training dataset
clf.fit('./data/mp_train_0.1.csv') # Training dataset file

# Load the trained model for prediction
clf = MolPredict(load_model='./data/mp_train') # Load the trained model
/opt/conda/lib/python3.8/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
2023-10-27 11:13:06 | unimol/data/datareader.py | 139 | INFO | Uni-Mol(QSAR) | Anomaly clean with 3 sigma threshold: 1762 -> 1759
2023-10-27 11:13:07 | unimol/data/conformer.py | 62 | INFO | Uni-Mol(QSAR) | Start generating conformers...
1759it [00:09, 180.91it/s]
2023-10-27 11:13:17 | unimol/data/conformer.py | 66 | INFO | Uni-Mol(QSAR) | Failed to generate conformers for 0.00% of molecules.
2023-10-27 11:13:17 | unimol/data/conformer.py | 68 | INFO | Uni-Mol(QSAR) | Failed to generate 3d conformers for 0.23% of molecules.
2023-10-27 11:13:17 | unimol/train.py | 88 | INFO | Uni-Mol(QSAR) | Output directory already exists: ./data/mp_train
2023-10-27 11:13:17 | unimol/train.py | 89 | INFO | Uni-Mol(QSAR) | Warning: Overwrite output directory: ./data/mp_train
2023-10-27 11:13:17 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2023-10-27 11:13:18 | unimol/models/nnmodel.py | 103 | INFO | Uni-Mol(QSAR) | start training Uni-Mol:unimolv1
2023-10-27 11:13:35 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/20] train_loss: 0.7981, val_loss: 0.8174, val_r2: 0.1709, lr: 0.000098, 14.2s
2023-10-27 11:13:43 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/20] train_loss: 0.6650, val_loss: 0.8618, val_r2: 0.1258, lr: 0.000093, 7.2s
2023-10-27 11:13:50 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/20] train_loss: 0.4852, val_loss: 0.3405, val_r2: 0.6547, lr: 0.000088, 7.1s
2023-10-27 11:13:58 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/20] train_loss: 0.3362, val_loss: 0.2807, val_r2: 0.7152, lr: 0.000082, 7.3s
2023-10-27 11:14:06 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/20] train_loss: 0.3118, val_loss: 0.3977, val_r2: 0.5966, lr: 0.000077, 7.4s
2023-10-27 11:14:13 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/20] train_loss: 0.3613, val_loss: 0.3394, val_r2: 0.6557, lr: 0.000072, 7.2s
2023-10-27 11:14:21 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/20] train_loss: 0.2672, val_loss: 0.2824, val_r2: 0.7136, lr: 0.000067, 7.3s
2023-10-27 11:14:28 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/20] train_loss: 0.2404, val_loss: 0.3605, val_r2: 0.6343, lr: 0.000062, 7.2s
2023-10-27 11:14:35 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/20] train_loss: 0.2012, val_loss: 0.3286, val_r2: 0.6667, lr: 0.000057, 7.2s
2023-10-27 11:14:35 | unimol/utils/metrics.py | 243 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 9
2023-10-27 11:14:36 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2023-10-27 11:14:37 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 0, result {'r2': 0.7152259508146563, 'mae': 36.378147, 'pearsonr': 0.8621145073463755, 'spearmanr': 0.8670408766295951, 'mse': 2259.5159}
2023-10-27 11:14:38 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2023-10-27 11:14:45 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/20] train_loss: 0.8832, val_loss: 0.4824, val_r2: 0.5187, lr: 0.000098, 7.6s
2023-10-27 11:14:53 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/20] train_loss: 0.4846, val_loss: 0.3974, val_r2: 0.6035, lr: 0.000093, 7.3s
2023-10-27 11:15:01 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/20] train_loss: 0.3652, val_loss: 0.3183, val_r2: 0.6825, lr: 0.000088, 7.2s
2023-10-27 11:15:09 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/20] train_loss: 0.3275, val_loss: 0.4822, val_r2: 0.5189, lr: 0.000082, 7.2s
2023-10-27 11:15:16 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/20] train_loss: 0.2670, val_loss: 0.3954, val_r2: 0.6055, lr: 0.000077, 7.2s
2023-10-27 11:15:23 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/20] train_loss: 0.2787, val_loss: 0.3295, val_r2: 0.6713, lr: 0.000072, 7.2s
2023-10-27 11:15:31 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/20] train_loss: 0.2667, val_loss: 0.3637, val_r2: 0.6372, lr: 0.000067, 7.2s
2023-10-27 11:15:38 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/20] train_loss: 0.1794, val_loss: 0.3376, val_r2: 0.6632, lr: 0.000062, 7.2s
2023-10-27 11:15:38 | unimol/utils/metrics.py | 243 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 8
2023-10-27 11:15:38 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2023-10-27 11:15:38 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 1, result {'r2': 0.6824719059028133, 'mae': 37.94684, 'pearsonr': 0.8295323942266773, 'spearmanr': 0.8361638357549785, 'mse': 2561.4954}
2023-10-27 11:15:39 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2023-10-27 11:15:47 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/20] train_loss: 0.6926, val_loss: 0.5382, val_r2: 0.5026, lr: 0.000098, 7.2s
2023-10-27 11:15:54 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/20] train_loss: 0.5031, val_loss: 0.6842, val_r2: 0.3677, lr: 0.000093, 7.3s
2023-10-27 11:16:02 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/20] train_loss: 0.4098, val_loss: 0.4996, val_r2: 0.5382, lr: 0.000088, 7.2s
2023-10-27 11:16:10 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/20] train_loss: 0.3782, val_loss: 0.3019, val_r2: 0.7210, lr: 0.000082, 7.3s
2023-10-27 11:16:18 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/20] train_loss: 0.2577, val_loss: 0.3733, val_r2: 0.6550, lr: 0.000077, 7.5s
2023-10-27 11:16:25 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/20] train_loss: 0.3145, val_loss: 0.4655, val_r2: 0.5698, lr: 0.000072, 7.4s
2023-10-27 11:16:32 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/20] train_loss: 0.2373, val_loss: 0.3799, val_r2: 0.6489, lr: 0.000067, 7.2s
2023-10-27 11:16:40 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/20] train_loss: 0.2243, val_loss: 0.3168, val_r2: 0.7073, lr: 0.000062, 7.2s
2023-10-27 11:16:47 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/20] train_loss: 0.1629, val_loss: 0.3071, val_r2: 0.7162, lr: 0.000057, 7.4s
2023-10-27 11:16:47 | unimol/utils/metrics.py | 243 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 9
2023-10-27 11:16:48 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2023-10-27 11:16:49 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 2, result {'r2': 0.7209541654983871, 'mae': 37.152107, 'pearsonr': 0.8522750803974984, 'spearmanr': 0.8455151298262741, 'mse': 2430.106}
2023-10-27 11:16:49 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2023-10-27 11:16:57 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/20] train_loss: 0.7350, val_loss: 0.4964, val_r2: 0.4533, lr: 0.000098, 7.4s
2023-10-27 11:17:05 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/20] train_loss: 0.4795, val_loss: 0.3734, val_r2: 0.5887, lr: 0.000093, 7.7s
2023-10-27 11:17:13 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/20] train_loss: 0.3753, val_loss: 0.3127, val_r2: 0.6556, lr: 0.000088, 7.5s
2023-10-27 11:17:22 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/20] train_loss: 0.3156, val_loss: 0.3795, val_r2: 0.5821, lr: 0.000082, 7.5s
2023-10-27 11:17:29 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/20] train_loss: 0.3080, val_loss: 0.2998, val_r2: 0.6698, lr: 0.000077, 7.4s
2023-10-27 11:17:37 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/20] train_loss: 0.2515, val_loss: 0.2871, val_r2: 0.6838, lr: 0.000072, 7.3s
2023-10-27 11:17:45 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/20] train_loss: 0.2196, val_loss: 0.4742, val_r2: 0.4778, lr: 0.000067, 7.3s
2023-10-27 11:17:52 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/20] train_loss: 0.1741, val_loss: 0.3376, val_r2: 0.6282, lr: 0.000062, 7.3s
2023-10-27 11:17:59 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/20] train_loss: 0.1497, val_loss: 0.2989, val_r2: 0.6708, lr: 0.000057, 7.4s
2023-10-27 11:18:07 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/20] train_loss: 0.1360, val_loss: 0.3531, val_r2: 0.6111, lr: 0.000052, 7.6s
2023-10-27 11:18:15 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [11/20] train_loss: 0.1216, val_loss: 0.3443, val_r2: 0.6209, lr: 0.000046, 7.4s
2023-10-27 11:18:15 | unimol/utils/metrics.py | 243 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 11
2023-10-27 11:18:16 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2023-10-27 11:18:16 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 3, result {'r2': 0.6837793590542565, 'mae': 35.573936, 'pearsonr': 0.8339808693169373, 'spearmanr': 0.8304578095495146, 'mse': 2310.9255}
2023-10-27 11:18:17 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2023-10-27 11:18:24 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/20] train_loss: 0.7210, val_loss: 0.3958, val_r2: 0.6082, lr: 0.000098, 7.3s
2023-10-27 11:18:33 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/20] train_loss: 0.4187, val_loss: 0.4111, val_r2: 0.5936, lr: 0.000093, 7.4s
2023-10-27 11:18:40 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/20] train_loss: 0.3838, val_loss: 0.3137, val_r2: 0.6896, lr: 0.000088, 7.4s
2023-10-27 11:18:48 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/20] train_loss: 0.3528, val_loss: 0.3431, val_r2: 0.6603, lr: 0.000082, 7.3s
2023-10-27 11:18:55 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/20] train_loss: 0.2693, val_loss: 0.2732, val_r2: 0.7296, lr: 0.000077, 7.6s
2023-10-27 11:19:04 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/20] train_loss: 0.2528, val_loss: 0.3643, val_r2: 0.6398, lr: 0.000072, 7.5s
2023-10-27 11:19:11 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/20] train_loss: 0.2164, val_loss: 0.2591, val_r2: 0.7436, lr: 0.000067, 7.4s
2023-10-27 11:19:20 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/20] train_loss: 0.1970, val_loss: 0.2915, val_r2: 0.7117, lr: 0.000062, 7.6s
2023-10-27 11:19:27 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/20] train_loss: 0.1452, val_loss: 0.2806, val_r2: 0.7222, lr: 0.000057, 7.7s
2023-10-27 11:19:35 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/20] train_loss: 0.1310, val_loss: 0.3252, val_r2: 0.6778, lr: 0.000052, 7.4s
2023-10-27 11:19:42 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [11/20] train_loss: 0.1214, val_loss: 0.3129, val_r2: 0.6899, lr: 0.000046, 7.5s
2023-10-27 11:19:50 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [12/20] train_loss: 0.1009, val_loss: 0.3293, val_r2: 0.6739, lr: 0.000041, 7.5s
2023-10-27 11:19:50 | unimol/utils/metrics.py | 243 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 12
2023-10-27 11:19:51 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2023-10-27 11:19:52 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 4, result {'r2': 0.7435969685025966, 'mae': 34.783394, 'pearsonr': 0.8652752320161483, 'spearmanr': 0.8533703891931845, 'mse': 2084.7852}
2023-10-27 11:19:52 | unimol/models/nnmodel.py | 144 | INFO | Uni-Mol(QSAR) | Uni-Mol metrics score: 
{'r2': 0.7105567561951571, 'mae': 36.36778393940169, 'pearsonr': 0.843600909421961, 'spearmanr': 0.8417814272843047, 'mse': 2329.504593731551}
2023-10-27 11:19:52 | unimol/models/nnmodel.py | 145 | INFO | Uni-Mol(QSAR) | Uni-Mol & Metric result saved!
代码
文本

Step5: Predict Melting Point

代码
文本
[5]
# Visualize the model's training results by plotting experimental vs. predicted values.
# Compare the experimental values and predicted values from the test set.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from unimol import MolPredict

# Load the trained model
clf = MolPredict(load_model='./data/mp_train')

# Predict using the test dataset
predict = clf.predict('./data/mp_test_0.1.csv').reshape(-1)

# Read the experimental data file
test_set = pd.read_csv("./data/mp_test_0.1.csv", header='infer')

# Extract the experimental "TARGET" values (melting points)
test_mp = test_set["TARGET"].to_numpy()

# Calculate the range for predicted and experimental values to set axis limits for the plot
xmin = min(predict.flatten().min(), test_mp.min())
xmax = max(predict.flatten().max(), test_mp.max())
ymin = xmin
ymax = xmax

# Set the size of the plot
plt.figure(figsize=(8, 8))

# Change the plot style
plt.style.use('seaborn-darkgrid')

# Set the x-axis and y-axis range
plt.xlim(xmin, xmax)
plt.ylim(ymin, ymax)

# Set the x-axis label to "Predicted Melting Point" and adjust the font size
plt.xlabel('Predicted Melting Point', fontsize=14)

# Set the y-axis label to "Experimental Melting Point" and adjust the font size
plt.ylabel('Experimental Melting Point', fontsize=14)

# Add a title "Experimental vs Predicted Melting Point" and adjust the font size
plt.title('Experimental vs Predicted Melting Point', fontsize=16)

# Plot the experimental vs predicted values using a scatter plot
# Set color, transparency, and labels
plt.scatter(predict.flatten(), test_mp, color='blue', alpha=0.6)

# Generate a line for y = x, indicating perfect predictions
x = np.linspace(*plt.xlim())
plt.plot(x, x, color='red', linestyle='--', linewidth=2) # Plot the y = x line to show where perfect predictions would lie

# Display the plot
plt.show()

<Figure size 576x576 with 1 Axes>
代码
文本

Practice

You can try running another round with the entire training set and test set to compare the prediction results of the two times.

代码
文本
Uni-Mol
Deep Learning
Uni-MolDeep Learning
点个赞吧
推荐阅读
公开
Uni-Mol Property Prediction Practice - Regression Task - Dielectric Constant of Electrolyte Molecule
Uni-MolEnglishelectrolytedielectric constant
Uni-MolEnglishelectrolytedielectric constant
Yani Guan
更新于 2024-09-17
公开
Uni-Mol性质预测实战-回归任务-电解液分子的介电常数
TutorialMachine Learning中文notebookUni-MolQSAR
TutorialMachine Learning中文notebookUni-MolQSAR
zhengh@dp.tech
发布于 2023-06-12
5 赞22 转存文件
{/**/}