Bohrium
robot
新建

空间站广场

论文
Notebooks
比赛
课程
Apps
我的主页
我的Notebooks
我的论文库
我的足迹

我的工作空间

任务
节点
文件
数据集
镜像
项目
数据库
公开
Uni-Mol Property Prediction Practice - Regression Task - Dielectric Constant of Electrolyte Molecule
Uni-Mol
English
electrolyte
dielectric constant
Uni-MolEnglishelectrolytedielectric constant
Yani Guan
更新于 2024-09-17
推荐镜像 :Uni-Mol:unimol-qsar:v0.5
推荐机型 :c3_m4_1 * NVIDIA T4
Uni-Mol Property Prediction Practice - Regression Task - Dielectric Constant of Electrolyte Molecule
Case Backgrounds
Step 1: Load the Data
Step2: import Uni-Mol
Step3: input data and traning
Step 4: Fine-tuning Parameters (Omitted)
Step5: Read in Molecular Conformations for EPS Prediction

Uni-Mol Property Prediction Practice - Regression Task - Dielectric Constant of Electrolyte Molecule

©️ Copyright 2023 @ Authors
Authors: Wentao Guo 📨 Hongshuai Wang 📨
Date:2023-06-06
License:This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Quick start:click Start Connect button above, choose unimol-qsar:0612 images and any GPU node configration, and wait a moment to run

代码
文本

AIMS:

  • Practical application of Uni-Mol in specific scenarios
  • Understanding the working modules of UniMol
  • Model training and prediction of physicochemical properties with molecular coordinates as input

Case Backgrounds

  • The dielectric constant (also called the relative permittivity) is a physical quantity used to describe a material's ability to polarize in response to an electric field. It is a dimensionless value that helps us understand the extent to which a medium responds to an electric field. The dielectric constant is also referred to as the static relative permittivity and is represented by the symbol . In formulas, the absolute dielectric constant (measured in farads per meter in the International System of Units) equals the product of the static relative permittivity and the dielectric constant in a vacuum (approximately ).

  • The dielectric constant of electrolyte molecules is a physical quantity that measures how molecules in an electrolyte solution respond to an electric field. It significantly affects the properties of the electrolyte solution and various electrochemical processes, influencing ion solubility, ion mobility, the conductivity of the electrolyte solution, activation energy of electrolysis reactions, and the stability of coordination reactions. Different applications require selecting electrolytes with appropriate dielectric constants to meet specific performance needs.

  • In this case, we will use Uni-Mol to predict the dielectric constants of molecules, aiming to:

    1. Learn a training method that uses molecular coordinates instead of SMILE strings as input.
    2. Apply regression models to predict continuous values.
    3. Use the trained model to predict the dielectric constants of certain molecules.
代码
文本

Step 1: Load the Data

代码
文本

At this point, some of you might be wondering, what exactly is a pkl file? In the previous BBBP scenario, our data files were the commonly used CSV format, which can be easily visualized with EXCEL. Pickle (often with the .pkl file extension) and CSV are two commonly used data storage formats, each with its own advantages and suitable use cases.

If you're unfamiliar with the pkl file format, let’s explore together why we use pkl for molecular coordinate data packaging!

  • Pickle is a binary serialization format unique to Python that can conveniently store almost any type of Python object, including custom classes, functions, modules, and more. This means that you can save complex data structures (such as lists, dictionaries, sets, numpy arrays, etc.) directly into a Pickle file and reload them when needed without additional processing. Therefore, Pickle is especially suitable for storing complex objects like machine learning models. In the case of molecular data, where atom types (N), molecular coordinates (3N), and corresponding predictions (1) create a typical "many-to-one" data structure, Pickle can better package and manage these data.

  • CSV, on the other hand, is a simple text format primarily used to store tabular data. Each line in a CSV file corresponds to a row in the table, and each field is separated by commas. Since CSV is a plain text format, it has excellent compatibility and readability and can be read by almost any data processing software or programming language. However, CSV can only store two-dimensional tabular data and cannot directly store more complex structures. For tasks where a SMILES string corresponds to a single prediction value, CSV is easier to edit and visualize, and it works well for storing such "one-to-one" data structures.

  • Therefore, for cases requiring the storage of complex data structures, Pickle is a better choice. For simpler two-dimensional tabular data, which might need to be manually reviewed or shared across different software or programming languages, CSV is the better option.

代码
文本
[1]
!wget https://dp-public.oss-cn-beijing.aliyuncs.com/community/courses/eps_data_test.pkl
!wget https://dp-public.oss-cn-beijing.aliyuncs.com/community/courses/eps_data_train.pkl
--2024-09-17 14:09:58--  https://dp-public.oss-cn-beijing.aliyuncs.com/community/courses/eps_data_test.pkl
Resolving ga.dp.tech (ga.dp.tech)... 10.255.254.37, 10.255.254.18, 10.255.254.7
Connecting to ga.dp.tech (ga.dp.tech)|10.255.254.37|:8118... connected.
Proxy request sent, awaiting response... 200 OK
Length: 36644 (36K) [application/octet-stream]
Saving to: ‘eps_data_test.pkl’

eps_data_test.pkl   100%[===================>]  35.79K  --.-KB/s    in 0.003s  

2024-09-17 14:09:58 (10.4 MB/s) - ‘eps_data_test.pkl’ saved [36644/36644]

--2024-09-17 14:09:59--  https://dp-public.oss-cn-beijing.aliyuncs.com/community/courses/eps_data_train.pkl
Resolving ga.dp.tech (ga.dp.tech)... 10.255.254.18, 10.255.254.7, 10.255.254.37
Connecting to ga.dp.tech (ga.dp.tech)|10.255.254.18|:8118... connected.
Proxy request sent, awaiting response... 200 OK
Length: 173342 (169K) [application/octet-stream]
Saving to: ‘eps_data_train.pkl’

eps_data_train.pkl  100%[===================>] 169.28K  --.-KB/s    in 0.03s   

2024-09-17 14:10:00 (6.22 MB/s) - ‘eps_data_train.pkl’ saved [173342/173342]

代码
文本
[2]
import pickle # 导入pickle来undump文件
f = open('eps_data_train.pkl', 'rb') # 'rb' for reading binary; can be omitted
eps_train = pickle.load(f) # 以字典“dict”载入pkl文件
f.close()
print(eps_train.keys())
# print(epsdata["target"])
# print(epsdata["atoms"])
# print(epsdata["coord"])
dict_keys(['target', 'atoms', 'coord'])
代码
文本

Step2: import Uni-Mol

代码
文本
[3]
from unimol import MolTrain, MolPredict
import numpy as np
/opt/conda/lib/python3.8/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
代码
文本

Step3: input data and traning

  • Note that the data type and format need to be converted into a format readable by custom_data, such as eps_train["target"].
代码
文本
[4]
print(eps_train["target"].shape)
print(eps_train["atoms"].shape)
print(eps_train["coord"].shape)
(500,)
(500,)
(500,)
代码
文本
[5]
custom_data ={'target':eps_train["target"],
'atoms':eps_train["atoms"].to_list(),
'coordinates':eps_train["coord"].to_list(),
'target_scaler':"none"}

clf = MolTrain(task='regression',
data_type='molecule',
epochs=10,
learning_rate=0.0001,
batch_size=16,
early_stopping=5,
metrics='mae',
split='random',
save_path='./eps_train',
)
clf.fit(custom_data)
2024-09-17 14:10:24 | unimol/data/datareader.py | 147 | INFO | Uni-Mol(QSAR) | Anomaly clean with 3 sigma threshold: 500 -> 488
2024-09-17 14:10:24 | unimol/train.py | 105 | INFO | Uni-Mol(QSAR) | Output directory already exists: ./eps_train
2024-09-17 14:10:24 | unimol/train.py | 106 | INFO | Uni-Mol(QSAR) | Warning: Overwrite output directory: ./eps_train
2024-09-17 14:10:25 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2024-09-17 14:10:25 | unimol/models/nnmodel.py | 103 | INFO | Uni-Mol(QSAR) | start training Uni-Mol:unimolv1
2024-09-17 14:10:37 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/10] train_loss: 0.8157, val_loss: 0.8609, val_mae: 7.5395, lr: 0.000093, 10.1s
2024-09-17 14:10:41 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/10] train_loss: 0.4906, val_loss: 0.4689, val_mae: 5.0604, lr: 0.000082, 3.1s
2024-09-17 14:10:45 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/10] train_loss: 0.3276, val_loss: 0.4049, val_mae: 6.5102, lr: 0.000072, 3.1s
2024-09-17 14:10:48 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/10] train_loss: 0.2424, val_loss: 0.1800, val_mae: 3.6547, lr: 0.000062, 3.1s
2024-09-17 14:10:52 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/10] train_loss: 0.1729, val_loss: 0.1090, val_mae: 2.7991, lr: 0.000052, 3.1s
2024-09-17 14:10:55 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/10] train_loss: 0.1399, val_loss: 0.0865, val_mae: 2.4446, lr: 0.000041, 3.0s
2024-09-17 14:10:59 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/10] train_loss: 0.1043, val_loss: 0.1115, val_mae: 2.8930, lr: 0.000031, 3.1s
2024-09-17 14:11:02 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/10] train_loss: 0.0893, val_loss: 0.0938, val_mae: 2.6660, lr: 0.000021, 3.1s
2024-09-17 14:11:05 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/10] train_loss: 0.0830, val_loss: 0.0986, val_mae: 3.1742, lr: 0.000010, 3.0s
2024-09-17 14:11:08 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/10] train_loss: 0.0679, val_loss: 0.0812, val_mae: 2.6569, lr: 0.000000, 3.0s
2024-09-17 14:11:09 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-09-17 14:11:09 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 0, result {'mae': 2.444621, 'pearsonr': 0.9725461719263702, 'spearmanr': 0.9123546363543582, 'mse': 13.380485, 'r2': 0.9409195791531426}
2024-09-17 14:11:10 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2024-09-17 14:11:13 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/10] train_loss: 1.1724, val_loss: 0.4726, val_mae: 6.1630, lr: 0.000093, 3.1s
2024-09-17 14:11:17 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/10] train_loss: 0.6891, val_loss: 0.2480, val_mae: 3.9029, lr: 0.000082, 3.1s
2024-09-17 14:11:20 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/10] train_loss: 0.4986, val_loss: 0.1703, val_mae: 3.6059, lr: 0.000072, 3.0s
2024-09-17 14:11:24 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/10] train_loss: 0.3181, val_loss: 0.3630, val_mae: 6.0811, lr: 0.000062, 3.0s
2024-09-17 14:11:27 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/10] train_loss: 0.2108, val_loss: 0.1008, val_mae: 3.0632, lr: 0.000052, 3.1s
2024-09-17 14:11:31 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/10] train_loss: 0.1855, val_loss: 0.1128, val_mae: 2.8973, lr: 0.000041, 3.1s
2024-09-17 14:11:35 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/10] train_loss: 0.1476, val_loss: 0.0534, val_mae: 1.9220, lr: 0.000031, 3.1s
2024-09-17 14:11:39 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/10] train_loss: 0.1106, val_loss: 0.0576, val_mae: 2.0288, lr: 0.000021, 3.1s
2024-09-17 14:11:42 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/10] train_loss: 0.1062, val_loss: 0.1186, val_mae: 2.9376, lr: 0.000010, 3.1s
2024-09-17 14:11:45 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/10] train_loss: 0.0949, val_loss: 0.0701, val_mae: 2.2602, lr: 0.000000, 3.1s
2024-09-17 14:11:45 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-09-17 14:11:46 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 1, result {'mae': 1.9219519, 'pearsonr': 0.967242943111378, 'spearmanr': 0.9183849126640824, 'mse': 8.067735, 'r2': 0.9317718623963656}
2024-09-17 14:11:46 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2024-09-17 14:11:50 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/10] train_loss: 0.8270, val_loss: 1.0078, val_mae: 6.3527, lr: 0.000093, 3.1s
2024-09-17 14:11:53 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/10] train_loss: 0.4959, val_loss: 0.7816, val_mae: 6.1435, lr: 0.000082, 3.1s
2024-09-17 14:11:57 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/10] train_loss: 0.2808, val_loss: 0.3553, val_mae: 4.0297, lr: 0.000072, 3.1s
2024-09-17 14:12:01 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/10] train_loss: 0.2008, val_loss: 0.3047, val_mae: 3.5471, lr: 0.000062, 3.0s
2024-09-17 14:12:05 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/10] train_loss: 0.1433, val_loss: 0.2868, val_mae: 3.1682, lr: 0.000052, 3.1s
2024-09-17 14:12:08 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/10] train_loss: 0.1394, val_loss: 0.2364, val_mae: 3.4910, lr: 0.000041, 3.1s
2024-09-17 14:12:11 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/10] train_loss: 0.0925, val_loss: 0.2073, val_mae: 2.6632, lr: 0.000031, 3.1s
2024-09-17 14:12:15 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/10] train_loss: 0.0817, val_loss: 0.4173, val_mae: 4.1615, lr: 0.000021, 3.1s
2024-09-17 14:12:18 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/10] train_loss: 0.0697, val_loss: 0.3445, val_mae: 3.1116, lr: 0.000010, 3.1s
2024-09-17 14:12:21 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/10] train_loss: 0.0690, val_loss: 0.2819, val_mae: 3.1856, lr: 0.000000, 3.0s
2024-09-17 14:12:22 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-09-17 14:12:22 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 2, result {'mae': 2.6632354, 'pearsonr': 0.9128733589905662, 'spearmanr': 0.9502256962347568, 'mse': 31.136843, 'r2': 0.8329210280157063}
2024-09-17 14:12:23 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2024-09-17 14:12:26 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/10] train_loss: 1.0041, val_loss: 0.8644, val_mae: 9.8966, lr: 0.000093, 3.1s
2024-09-17 14:12:30 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/10] train_loss: 0.9185, val_loss: 0.3185, val_mae: 4.4445, lr: 0.000082, 3.4s
2024-09-17 14:12:34 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/10] train_loss: 0.4108, val_loss: 0.4465, val_mae: 6.3175, lr: 0.000072, 3.1s
2024-09-17 14:12:37 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/10] train_loss: 0.2654, val_loss: 0.1721, val_mae: 3.6764, lr: 0.000062, 3.0s
2024-09-17 14:12:40 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/10] train_loss: 0.2126, val_loss: 0.1485, val_mae: 3.1739, lr: 0.000052, 3.1s
2024-09-17 14:12:44 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/10] train_loss: 0.1598, val_loss: 0.0699, val_mae: 2.5844, lr: 0.000041, 3.1s
2024-09-17 14:12:48 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/10] train_loss: 0.1467, val_loss: 0.0962, val_mae: 2.3810, lr: 0.000031, 3.0s
2024-09-17 14:12:52 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/10] train_loss: 0.1026, val_loss: 0.0743, val_mae: 2.8888, lr: 0.000021, 3.2s
2024-09-17 14:12:56 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/10] train_loss: 0.0900, val_loss: 0.0598, val_mae: 2.3403, lr: 0.000010, 3.8s
2024-09-17 14:13:00 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/10] train_loss: 0.0883, val_loss: 0.0591, val_mae: 2.5604, lr: 0.000000, 3.1s
2024-09-17 14:13:00 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-09-17 14:13:00 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 3, result {'mae': 2.34029, 'pearsonr': 0.9588346089298846, 'spearmanr': 0.9057723981255058, 'mse': 9.400122, 'r2': 0.8903235786705405}
2024-09-17 14:13:01 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2024-09-17 14:13:04 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/10] train_loss: 0.8800, val_loss: 0.4444, val_mae: 5.8270, lr: 0.000093, 3.1s
2024-09-17 14:13:08 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/10] train_loss: 0.5974, val_loss: 0.4910, val_mae: 7.1490, lr: 0.000082, 3.1s
2024-09-17 14:13:11 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/10] train_loss: 0.3656, val_loss: 0.2387, val_mae: 4.2860, lr: 0.000072, 3.0s
2024-09-17 14:13:15 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/10] train_loss: 0.2240, val_loss: 0.1328, val_mae: 3.3094, lr: 0.000062, 3.1s
2024-09-17 14:13:19 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/10] train_loss: 0.1599, val_loss: 0.2003, val_mae: 3.4868, lr: 0.000052, 3.0s
2024-09-17 14:13:22 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/10] train_loss: 0.1062, val_loss: 0.0813, val_mae: 2.2801, lr: 0.000041, 3.1s
2024-09-17 14:13:26 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/10] train_loss: 0.0913, val_loss: 0.1103, val_mae: 2.6971, lr: 0.000031, 3.1s
2024-09-17 14:13:28 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/10] train_loss: 0.0785, val_loss: 0.0794, val_mae: 1.9937, lr: 0.000021, 3.0s
2024-09-17 14:13:32 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/10] train_loss: 0.0603, val_loss: 0.0837, val_mae: 2.0807, lr: 0.000010, 3.0s
2024-09-17 14:13:36 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/10] train_loss: 0.0534, val_loss: 0.0895, val_mae: 2.3770, lr: 0.000000, 4.0s
2024-09-17 14:13:36 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-09-17 14:13:37 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 4, result {'mae': 1.9936786, 'pearsonr': 0.9519365032853556, 'spearmanr': 0.9179652058074022, 'mse': 13.714977, 'r2': 0.8882068969317306}
2024-09-17 14:13:37 | unimol/models/nnmodel.py | 144 | INFO | Uni-Mol(QSAR) | Uni-Mol metrics score: 
{'mae': 2.2731889115826993, 'pearsonr': 0.9496033854988929, 'spearmanr': 0.924318698428456, 'mse': 15.154713609003048, 'r2': 0.8988214902246847}
2024-09-17 14:13:37 | unimol/models/nnmodel.py | 145 | INFO | Uni-Mol(QSAR) | Uni-Mol & Metric result saved!
代码
文本

Step 4: Fine-tuning Parameters (Omitted)

  • We have already covered how to tune parameters in previous cases, so we will not go into detail here. For more information, refer to the BBBP case:
    Open In Bohrium

  • As an exercise, please try adding a code block to fine-tune the hyperparameters.

代码
文本

Step5: Read in Molecular Conformations for EPS Prediction

The data type of the test set is consistent with the training set, requiring an atomic type sequence + atomic coordinate sequence.

代码
文本
[6]
import numpy as np
f = open('eps_data_test.pkl', 'rb') # 'rb' for reading binary; can be omitted
eps_test = pickle.load(f)
f.close()

custom_data = {
'atoms':eps_test["atoms"].to_list(),
'coordinates':eps_test["coord"].to_list(),
}

clf = MolPredict(load_model = './eps_train')
predict = clf.predict(custom_data)
2024-09-17 14:43:32 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2024-09-17 14:43:33 | unimol/models/nnmodel.py | 154 | INFO | Uni-Mol(QSAR) | start predict NNModel:unimolv1
2024-09-17 14:43:33 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-09-17 14:43:34 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-09-17 14:43:34 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-09-17 14:43:35 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-09-17 14:43:36 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
                                                  
代码
文本

The experimental dielectric constant data of the test set is in the file https://dp-public.oss-cn-beijing.aliyuncs.com/community/courses/eps_test.csv

代码
文本
[7]
!wget https://dp-public.oss-cn-beijing.aliyuncs.com/community/courses/eps_test.csv
--2024-09-17 14:44:48--  https://dp-public.oss-cn-beijing.aliyuncs.com/community/courses/eps_test.csv
Resolving ga.dp.tech (ga.dp.tech)... 10.255.254.37, 10.255.254.7, 10.255.254.18
Connecting to ga.dp.tech (ga.dp.tech)|10.255.254.37|:8118... connected.
Proxy request sent, awaiting response... 200 OK
Length: 2190 (2.1K) [text/csv]
Saving to: ‘eps_test.csv’

eps_test.csv        100%[===================>]   2.14K  --.-KB/s    in 0s      

2024-09-17 14:44:48 (328 MB/s) - ‘eps_test.csv’ saved [2190/2190]

代码
文本
[8]
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

test_set = pd.read_csv("./eps_test.csv",header='infer')
test_eps = test_set["eps"].to_numpy()

xmin = min(predict.flatten().min(), test_eps.min())
xmax = max(predict.flatten().max(), test_eps.max())
ymin = xmin
ymax = xmax

plt.figure(figsize=(8, 8))
plt.style.use('seaborn-darkgrid')
plt.xlim(xmin, xmax)
plt.ylim(ymin, ymax)
plt.xlabel('Predicted $\epsilon$', fontsize=14)
plt.ylabel('Experimental $\epsilon$', fontsize=14)
plt.title('Experimental vs Predicted $\epsilon$', fontsize=16)
plt.scatter(predict.flatten(),test_eps, color='blue', alpha=0.6)
x = np.linspace(*plt.xlim())
plt.plot(x, x, color='red', linestyle='--', linewidth=2)

plt.show()

代码
文本

It can be seen that when predicting the EPS values of molecules that the prediction model has never seen before, the majority of the data have good prediction results. Among them, there are 3 outliers (outliers, whose feature values are significantly different from those in the training data), indicating that our model has poor predictive ability for these 3 molecules. If the test data contains abnormal structures, the model may make poor predictions for these points. In such cases, we may need to further clean and preprocess the test data.

代码
文本
Uni-Mol
English
electrolyte
dielectric constant
Uni-MolEnglishelectrolytedielectric constant
点个赞吧
推荐阅读
公开
Uni-Mol性质预测实战-回归任务-电解液分子的介电常数
TutorialMachine Learning中文notebookUni-MolQSAR
TutorialMachine Learning中文notebookUni-MolQSAR
zhengh@dp.tech
发布于 2023-06-12
5 赞22 转存文件
公开
Uni-Mol性质预测实战-回归任务-有机/电解液分子的熔点预测
Uni-MolDeep Learning中文
Uni-MolDeep Learning中文
Letian
发布于 2023-10-31
3 赞4 转存文件