空间站广场

论文

Notebooks

比赛

课程

Apps

我的主页

我的Notebooks

我的论文库

我的足迹

我的工作空间

任务

节点

文件

数据集

镜像

项目

数据库

公开

Uni-Mol Property Prediction Practice - Regression Task - Melting Point Prediction of Organic/Elect

Uni-Mol

Deep Learning

Uni-MolDeep Learning

Yani Guan

更新于 2024-10-17

推荐镜像 :Uni-Mol:unimol-qsar:v0.5

推荐机型 :c3_m4_1 * NVIDIA T4

Uni-Mol Property Prediction Practice - Regression Task - Melting Point Prediction of Organic/Electrolyte Molecules

Case Background

Step1: Read Data

Step2: Sampling Data

Step 3: Dataset Distribution Visualization

Step4: Train the Model

Step5: Predict Melting Point

Practice

Uni-Mol Property Prediction Practice - Regression Task - Melting Point Prediction of Organic/Electrolyte Molecules

©️ Copyright 2023 @ Authors
Authors： Boshen Zeng 📨 Hongshuai Wang 📨
Date: 2023-07-03
License Agreement：This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Quick Start: Click the Start Connection button above, select the unimol-qsar:v0.4 image and any GPU node configuration, and wait a moment to run.

代码

文本

Case Background

Melting point is used to describe the temperature required for a substance to transition from a solid state to a liquid state. Typically, under constant pressure, when a substance is heated and transitions from a solid to a liquid, the temperature of the substance does not rise until all the solid has turned into liquid, after which the temperature will continue to rise.
In the battery field, the melting point of electrolyte molecules is an important physical quantity to measure their stability and usable temperature range. Excellent electrolyte materials are required to meet a wide liquid range, and different application scenarios require the selection of electrolytes with appropriate melting points to meet specific performance requirements.
Predicting the melting point of unknown molecules will help us reverse-screen materials that can be used as electrolytes from the possible chemical space.

代码

文本

Step1: Read Data

Contains nearly 20,000 molecules' SMILES notation and melting point measurement data (TARGET)
TARGET is a continuous value (unit: Celsius)

代码

文本

[1]

!wget -P ./data/ https://dp-public.oss-cn-beijing.aliyuncs.com/community/mp_test.csv

!wget -P ./data/ https://dp-public.oss-cn-beijing.aliyuncs.com/community/mp_train.csv

--2023-10-27 15:56:22--  https://dp-public.oss-cn-beijing.aliyuncs.com/community/mp_test.csv
Resolving ga.dp.tech (ga.dp.tech)... 10.255.254.18, 10.255.254.7, 10.255.254.37
Connecting to ga.dp.tech (ga.dp.tech)|10.255.254.18|:8118... connected.
Proxy request sent, awaiting response... 200 OK
Length: 69686 (68K) [text/csv]
./data/mp_test.csv.1: Read-only file system

Cannot write to ‘./data/mp_test.csv.1’ (Success).
--2023-10-27 15:56:23--  https://dp-public.oss-cn-beijing.aliyuncs.com/community/mp_train.csv
Resolving ga.dp.tech (ga.dp.tech)... 10.255.254.7, 10.255.254.18, 10.255.254.37
Connecting to ga.dp.tech (ga.dp.tech)|10.255.254.7|:8118... connected.
Proxy request sent, awaiting response... 200 OK
Length: 626850 (612K) [text/csv]
./data/mp_train.csv.1: Read-only file system

Cannot write to ‘./data/mp_train.csv.1’ (Success).

代码

文本

Step2: Sampling Data

Due to the large size of our melting point dataset, it is difficult to complete the demonstration in a short time. Here, we randomly sample 10% for training and testing respectively.
If interested, you can use the complete training set and test set for better prediction results.

代码

文本

[2]

import pandas as pd

# Load the complete training dataset

train_data_total = pd.read_csv('./data/mp_train.csv')

# Randomly sample 10% of the data for training

train_data = train_data_total.sample(frac=0.1, random_state=1)

print("------------ Sampled Train Data ------------") # Display the sampled training data

print(train_data)

# Rename the columns to "SMILES" and "TARGET"

train_data.columns = ["SMILES", "TARGET"]

# Save the randomly sampled dataset

train_data.to_csv('./data/mp_train_0.1.csv')

print('\n')

# Load the complete test dataset

test_data_total = pd.read_csv('./data/mp_test.csv')

# Randomly sample 10% of the test data

test_data = test_data_total.sample(frac=0.1, random_state=1)

print("------------ Sampled Test Data ------------") # Display the sampled test data

print(test_data)

# Rename the columns to "SMILES" and "TARGET"

test_data.columns = ["SMILES", "TARGET"]

# Save the randomly sampled test dataset

test_data.to_csv('./data/mp_test_0.1.csv')

------------ Sampled Train Data ------------
                                        SMILES  TARGET
17446   Cc1cc(C(O)=NCC(=N)O)c(C)n1-c1ccc(F)cc1   194.0
15336   Clc1ccccc1C(c1ccccc1)(c1ccccc1)n1ccnc1   147.5
16009           COc1cc(CO)c([N+](=O)[O-])cc1OC   146.0
1610                       S=C=Nc1c(Cl)cccc1Cl    43.0
9193                       O=C(c1cccs1)c1cccs1    89.5
...                                        ...     ...
2528                       COS(=O)(=O)c1ccccc1    -4.0
384           Nc1nc(N)c2nc(-c3ccccc3)c(N)nc2n1   316.0
2748   OCC1OC(OC2(CO)OC(CO)C(O)C2O)C(O)C(O)C1O   185.5
15312              COc1ccc(C2(C)C=C(C)N=N2)cc1    70.0
941                       Nc1ccc(O)c(C(=O)O)c1   283.5

[1762 rows x 2 columns]


------------ Sampled Test Data ------------
                                                 SMILES  TARGET
900                                COc1cc2c(cc1O)CCNC2C   221.5
1279                                FC(F)(F)c1cccc(I)c1    -8.0
1929  c1ccc(N=Nc2ccc(N=Nc3ccc(N=Nc4ccc(N=Nc5ccccc5)c...   275.0
953                  COc1ccc2c(c1)N(CCCN(C)C)c1ccccc1S2    46.0
705                 CCc1c(C)ccc2c1CCC1C2CCC2CC(O)CCC21C   172.0
...                                                 ...     ...
1227                             Cl[Si](Cl)(Cl)c1ccccc1  -127.0
37                                      C#CCCCCCCC(=O)O    19.0
816            COC(=O)c1cc(C)ccc1C1=NC(C)(C(C)C)C(=O)N1   133.0
1388                                          NCc1ccco1   -70.0
1591                                O=C(O)c1cc(I)ccc1Cl   159.0

[196 rows x 2 columns]

代码

文本

Step 3: Dataset Distribution Visualization

代码

文本

[3]

import matplotlib.pyplot as plt

bins = 30

plt.figure(figsize=(8, 6))

plt.hist(train_data["TARGET"],label="Train Data")

plt.hist(test_data["TARGET"],label="Test Data")

plt.ylabel("Count")

plt.xlabel("Melting Point (℃)")

plt.title("Distribution")

plt.legend(prop={'size': 12})

plt.tick_params(labelsize=14)

plt.tight_layout()

plt.savefig('./data/dataset_distribution_histogram.png',

format='png')

<Figure size 576x432 with 1 Axes>

代码

文本

Step4: Train the Model

Use the uni-mol tool to train the model on the data

代码

文本

[4]

from unimol import MolTrain, MolPredict

import numpy as np

# Initialize a model for molecular regression task

clf = MolTrain(task='regression', # Regression task

data_type='molecule', # Data type: molecule

epochs=20, # Number of iterations, representing how many times the model goes through the entire training dataset.

# In each epoch, the model updates its parameters based on the training data to reduce prediction errors.

learning_rate=0.0001, # Learning rate

batch_size=16, # Batch size

early_stopping=5, # Stop early if performance doesn't improve for 5 epochs

metrics='r2', # Evaluation metric: R-squared

split='random', # Random data split

save_path='./data/mp_train', # Model save path

)

# Train the model using the training dataset

clf.fit('./data/mp_train_0.1.csv') # Training dataset file

# Load the trained model for prediction

clf = MolPredict(load_model='./data/mp_train') # Load the trained model

/opt/conda/lib/python3.8/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
2023-10-27 11:13:06 | unimol/data/datareader.py | 139 | INFO | Uni-Mol(QSAR) | Anomaly clean with 3 sigma threshold: 1762 -> 1759
2023-10-27 11:13:07 | unimol/data/conformer.py | 62 | INFO | Uni-Mol(QSAR) | Start generating conformers...
1759it [00:09, 180.91it/s]
2023-10-27 11:13:17 | unimol/data/conformer.py | 66 | INFO | Uni-Mol(QSAR) | Failed to generate conformers for 0.00% of molecules.
2023-10-27 11:13:17 | unimol/data/conformer.py | 68 | INFO | Uni-Mol(QSAR) | Failed to generate 3d conformers for 0.23% of molecules.
2023-10-27 11:13:17 | unimol/train.py | 88 | INFO | Uni-Mol(QSAR) | Output directory already exists: ./data/mp_train
2023-10-27 11:13:17 | unimol/train.py | 89 | INFO | Uni-Mol(QSAR) | Warning: Overwrite output directory: ./data/mp_train
2023-10-27 11:13:17 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2023-10-27 11:13:18 | unimol/models/nnmodel.py | 103 | INFO | Uni-Mol(QSAR) | start training Uni-Mol:unimolv1
2023-10-27 11:13:35 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/20] train_loss: 0.7981, val_loss: 0.8174, val_r2: 0.1709, lr: 0.000098, 14.2s
2023-10-27 11:13:43 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/20] train_loss: 0.6650, val_loss: 0.8618, val_r2: 0.1258, lr: 0.000093, 7.2s
2023-10-27 11:13:50 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/20] train_loss: 0.4852, val_loss: 0.3405, val_r2: 0.6547, lr: 0.000088, 7.1s
2023-10-27 11:13:58 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/20] train_loss: 0.3362, val_loss: 0.2807, val_r2: 0.7152, lr: 0.000082, 7.3s
2023-10-27 11:14:06 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/20] train_loss: 0.3118, val_loss: 0.3977, val_r2: 0.5966, lr: 0.000077, 7.4s
2023-10-27 11:14:13 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/20] train_loss: 0.3613, val_loss: 0.3394, val_r2: 0.6557, lr: 0.000072, 7.2s
2023-10-27 11:14:21 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/20] train_loss: 0.2672, val_loss: 0.2824, val_r2: 0.7136, lr: 0.000067, 7.3s
2023-10-27 11:14:28 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/20] train_loss: 0.2404, val_loss: 0.3605, val_r2: 0.6343, lr: 0.000062, 7.2s
2023-10-27 11:14:35 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/20] train_loss: 0.2012, val_loss: 0.3286, val_r2: 0.6667, lr: 0.000057, 7.2s
2023-10-27 11:14:35 | unimol/utils/metrics.py | 243 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 9
2023-10-27 11:14:36 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2023-10-27 11:14:37 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 0, result {'r2': 0.7152259508146563, 'mae': 36.378147, 'pearsonr': 0.8621145073463755, 'spearmanr': 0.8670408766295951, 'mse': 2259.5159}
2023-10-27 11:14:38 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2023-10-27 11:14:45 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/20] train_loss: 0.8832, val_loss: 0.4824, val_r2: 0.5187, lr: 0.000098, 7.6s
2023-10-27 11:14:53 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/20] train_loss: 0.4846, val_loss: 0.3974, val_r2: 0.6035, lr: 0.000093, 7.3s
2023-10-27 11:15:01 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/20] train_loss: 0.3652, val_loss: 0.3183, val_r2: 0.6825, lr: 0.000088, 7.2s
2023-10-27 11:15:09 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/20] train_loss: 0.3275, val_loss: 0.4822, val_r2: 0.5189, lr: 0.000082, 7.2s
2023-10-27 11:15:16 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/20] train_loss: 0.2670, val_loss: 0.3954, val_r2: 0.6055, lr: 0.000077, 7.2s
2023-10-27 11:15:23 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/20] train_loss: 0.2787, val_loss: 0.3295, val_r2: 0.6713, lr: 0.000072, 7.2s
2023-10-27 11:15:31 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/20] train_loss: 0.2667, val_loss: 0.3637, val_r2: 0.6372, lr: 0.000067, 7.2s
2023-10-27 11:15:38 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/20] train_loss: 0.1794, val_loss: 0.3376, val_r2: 0.6632, lr: 0.000062, 7.2s
2023-10-27 11:15:38 | unimol/utils/metrics.py | 243 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 8
2023-10-27 11:15:38 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2023-10-27 11:15:38 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 1, result {'r2': 0.6824719059028133, 'mae': 37.94684, 'pearsonr': 0.8295323942266773, 'spearmanr': 0.8361638357549785, 'mse': 2561.4954}
2023-10-27 11:15:39 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2023-10-27 11:15:47 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/20] train_loss: 0.6926, val_loss: 0.5382, val_r2: 0.5026, lr: 0.000098, 7.2s
2023-10-27 11:15:54 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/20] train_loss: 0.5031, val_loss: 0.6842, val_r2: 0.3677, lr: 0.000093, 7.3s
2023-10-27 11:16:02 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/20] train_loss: 0.4098, val_loss: 0.4996, val_r2: 0.5382, lr: 0.000088, 7.2s
2023-10-27 11:16:10 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/20] train_loss: 0.3782, val_loss: 0.3019, val_r2: 0.7210, lr: 0.000082, 7.3s
2023-10-27 11:16:18 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/20] train_loss: 0.2577, val_loss: 0.3733, val_r2: 0.6550, lr: 0.000077, 7.5s
2023-10-27 11:16:25 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/20] train_loss: 0.3145, val_loss: 0.4655, val_r2: 0.5698, lr: 0.000072, 7.4s
2023-10-27 11:16:32 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/20] train_loss: 0.2373, val_loss: 0.3799, val_r2: 0.6489, lr: 0.000067, 7.2s
2023-10-27 11:16:40 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/20] train_loss: 0.2243, val_loss: 0.3168, val_r2: 0.7073, lr: 0.000062, 7.2s
2023-10-27 11:16:47 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/20] train_loss: 0.1629, val_loss: 0.3071, val_r2: 0.7162, lr: 0.000057, 7.4s
2023-10-27 11:16:47 | unimol/utils/metrics.py | 243 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 9
2023-10-27 11:16:48 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2023-10-27 11:16:49 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 2, result {'r2': 0.7209541654983871, 'mae': 37.152107, 'pearsonr': 0.8522750803974984, 'spearmanr': 0.8455151298262741, 'mse': 2430.106}
2023-10-27 11:16:49 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2023-10-27 11:16:57 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/20] train_loss: 0.7350, val_loss: 0.4964, val_r2: 0.4533, lr: 0.000098, 7.4s
2023-10-27 11:17:05 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/20] train_loss: 0.4795, val_loss: 0.3734, val_r2: 0.5887, lr: 0.000093, 7.7s
2023-10-27 11:17:13 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/20] train_loss: 0.3753, val_loss: 0.3127, val_r2: 0.6556, lr: 0.000088, 7.5s
2023-10-27 11:17:22 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/20] train_loss: 0.3156, val_loss: 0.3795, val_r2: 0.5821, lr: 0.000082, 7.5s
2023-10-27 11:17:29 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/20] train_loss: 0.3080, val_loss: 0.2998, val_r2: 0.6698, lr: 0.000077, 7.4s
2023-10-27 11:17:37 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/20] train_loss: 0.2515, val_loss: 0.2871, val_r2: 0.6838, lr: 0.000072, 7.3s
2023-10-27 11:17:45 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/20] train_loss: 0.2196, val_loss: 0.4742, val_r2: 0.4778, lr: 0.000067, 7.3s
2023-10-27 11:17:52 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/20] train_loss: 0.1741, val_loss: 0.3376, val_r2: 0.6282, lr: 0.000062, 7.3s
2023-10-27 11:17:59 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/20] train_loss: 0.1497, val_loss: 0.2989, val_r2: 0.6708, lr: 0.000057, 7.4s
2023-10-27 11:18:07 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/20] train_loss: 0.1360, val_loss: 0.3531, val_r2: 0.6111, lr: 0.000052, 7.6s
2023-10-27 11:18:15 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [11/20] train_loss: 0.1216, val_loss: 0.3443, val_r2: 0.6209, lr: 0.000046, 7.4s
2023-10-27 11:18:15 | unimol/utils/metrics.py | 243 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 11
2023-10-27 11:18:16 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2023-10-27 11:18:16 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 3, result {'r2': 0.6837793590542565, 'mae': 35.573936, 'pearsonr': 0.8339808693169373, 'spearmanr': 0.8304578095495146, 'mse': 2310.9255}
2023-10-27 11:18:17 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2023-10-27 11:18:24 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/20] train_loss: 0.7210, val_loss: 0.3958, val_r2: 0.6082, lr: 0.000098, 7.3s
2023-10-27 11:18:33 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/20] train_loss: 0.4187, val_loss: 0.4111, val_r2: 0.5936, lr: 0.000093, 7.4s
2023-10-27 11:18:40 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/20] train_loss: 0.3838, val_loss: 0.3137, val_r2: 0.6896, lr: 0.000088, 7.4s
2023-10-27 11:18:48 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/20] train_loss: 0.3528, val_loss: 0.3431, val_r2: 0.6603, lr: 0.000082, 7.3s
2023-10-27 11:18:55 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/20] train_loss: 0.2693, val_loss: 0.2732, val_r2: 0.7296, lr: 0.000077, 7.6s
2023-10-27 11:19:04 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/20] train_loss: 0.2528, val_loss: 0.3643, val_r2: 0.6398, lr: 0.000072, 7.5s
2023-10-27 11:19:11 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/20] train_loss: 0.2164, val_loss: 0.2591, val_r2: 0.7436, lr: 0.000067, 7.4s
2023-10-27 11:19:20 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/20] train_loss: 0.1970, val_loss: 0.2915, val_r2: 0.7117, lr: 0.000062, 7.6s
2023-10-27 11:19:27 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/20] train_loss: 0.1452, val_loss: 0.2806, val_r2: 0.7222, lr: 0.000057, 7.7s
2023-10-27 11:19:35 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/20] train_loss: 0.1310, val_loss: 0.3252, val_r2: 0.6778, lr: 0.000052, 7.4s
2023-10-27 11:19:42 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [11/20] train_loss: 0.1214, val_loss: 0.3129, val_r2: 0.6899, lr: 0.000046, 7.5s
2023-10-27 11:19:50 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [12/20] train_loss: 0.1009, val_loss: 0.3293, val_r2: 0.6739, lr: 0.000041, 7.5s
2023-10-27 11:19:50 | unimol/utils/metrics.py | 243 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 12
2023-10-27 11:19:51 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2023-10-27 11:19:52 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 4, result {'r2': 0.7435969685025966, 'mae': 34.783394, 'pearsonr': 0.8652752320161483, 'spearmanr': 0.8533703891931845, 'mse': 2084.7852}
2023-10-27 11:19:52 | unimol/models/nnmodel.py | 144 | INFO | Uni-Mol(QSAR) | Uni-Mol metrics score: 
{'r2': 0.7105567561951571, 'mae': 36.36778393940169, 'pearsonr': 0.843600909421961, 'spearmanr': 0.8417814272843047, 'mse': 2329.504593731551}
2023-10-27 11:19:52 | unimol/models/nnmodel.py | 145 | INFO | Uni-Mol(QSAR) | Uni-Mol & Metric result saved!

代码

文本

Step5: Predict Melting Point

代码

文本

[5]

# Visualize the model's training results by plotting experimental vs. predicted values.

# Compare the experimental values and predicted values from the test set.

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

from unimol import MolPredict

# Load the trained model

clf = MolPredict(load_model='./data/mp_train')

# Predict using the test dataset

predict = clf.predict('./data/mp_test_0.1.csv').reshape(-1)

# Read the experimental data file

test_set = pd.read_csv("./data/mp_test_0.1.csv", header='infer')

# Extract the experimental "TARGET" values (melting points)

test_mp = test_set["TARGET"].to_numpy()

# Calculate the range for predicted and experimental values to set axis limits for the plot

xmin = min(predict.flatten().min(), test_mp.min())

xmax = max(predict.flatten().max(), test_mp.max())

ymin = xmin

ymax = xmax

# Set the size of the plot

plt.figure(figsize=(8, 8))

# Change the plot style

plt.style.use('seaborn-darkgrid')

# Set the x-axis and y-axis range

plt.xlim(xmin, xmax)

plt.ylim(ymin, ymax)

# Set the x-axis label to "Predicted Melting Point" and adjust the font size

plt.xlabel('Predicted Melting Point', fontsize=14)

# Set the y-axis label to "Experimental Melting Point" and adjust the font size

plt.ylabel('Experimental Melting Point', fontsize=14)

# Add a title "Experimental vs Predicted Melting Point" and adjust the font size

plt.title('Experimental vs Predicted Melting Point', fontsize=16)

# Plot the experimental vs predicted values using a scatter plot

# Set color, transparency, and labels

plt.scatter(predict.flatten(), test_mp, color='blue', alpha=0.6)

# Generate a line for y = x, indicating perfect predictions

x = np.linspace(*plt.xlim())

plt.plot(x, x, color='red', linestyle='--', linewidth=2) # Plot the y = x line to show where perfect predictions would lie

# Display the plot

plt.show()

<Figure size 576x576 with 1 Axes>

代码

文本

Practice

You can try running another round with the entire training set and test set to compare the prediction results of the two times.

代码

文本

Uni-Mol

Deep Learning

Uni-MolDeep Learning

点个赞吧