空间站广场

论文

Notebooks

比赛

课程

Apps

我的主页

我的Notebooks

我的论文库

我的足迹

我的工作空间

任务

节点

文件

数据集

镜像

项目

数据库

公开

Uni-Mol Property Prediction Practice - Regression Task - Dielectric Constant of Electrolyte Molecule

Uni-Mol

English

electrolyte

dielectric constant

Uni-MolEnglishelectrolytedielectric constant

Yani Guan

更新于 2024-09-17

推荐镜像 :Uni-Mol:unimol-qsar:v0.5

推荐机型 :c3_m4_1 * NVIDIA T4

Uni-Mol Property Prediction Practice - Regression Task - Dielectric Constant of Electrolyte Molecule

Case Backgrounds

Step 1: Load the Data

Step2: import Uni-Mol

Step3: input data and traning

Step 4: Fine-tuning Parameters (Omitted)

Step5: Read in Molecular Conformations for EPS Prediction

Uni-Mol Property Prediction Practice - Regression Task - Dielectric Constant of Electrolyte Molecule

©️ Copyright 2023 @ Authors
Authors： Wentao Guo 📨 Hongshuai Wang 📨
Date：2023-06-06
License：This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Quick start：click Start Connect button above, choose unimol-qsar:0612 images and any GPU node configration, and wait a moment to run

代码

文本

AIMS：

Practical application of Uni-Mol in specific scenarios
Understanding the working modules of UniMol
Model training and prediction of physicochemical properties with molecular coordinates as input

Case Backgrounds

The dielectric constant (also called the relative permittivity) is a physical quantity used to describe a material's ability to polarize in response to an electric field. It is a dimensionless value that helps us understand the extent to which a medium responds to an electric field. The dielectric constant is also referred to as the static relative permittivity and is represented by the symbol $ϵ_{r}$ . In formulas, the absolute dielectric constant $ϵ$ (measured in farads per meter in the International System of Units) equals the product of the static relative permittivity $ϵ_{r}$ and the dielectric constant in a vacuum $ϵ_{0}$ (approximately $8.854 * 1 0^{- 12} F / m$ ).
The dielectric constant of electrolyte molecules is a physical quantity that measures how molecules in an electrolyte solution respond to an electric field. It significantly affects the properties of the electrolyte solution and various electrochemical processes, influencing ion solubility, ion mobility, the conductivity of the electrolyte solution, activation energy of electrolysis reactions, and the stability of coordination reactions. Different applications require selecting electrolytes with appropriate dielectric constants to meet specific performance needs.
In this case, we will use Uni-Mol to predict the dielectric constants of molecules, aiming to:
1. Learn a training method that uses molecular coordinates instead of SMILE strings as input.
2. Apply regression models to predict continuous values.
3. Use the trained model to predict the dielectric constants of certain molecules.

代码

文本

Step 1: Load the Data

The dataset includes a pkl file containing EPS and coordinate information for 500 molecules:
https://dp-public.oss-cn-beijing.aliyuncs.com/community/courses/eps_data_test.pkl
https://dp-public.oss-cn-beijing.aliyuncs.com/community/courses/eps_data_train.pkl
The input file needs to contain the TARGET, atom types (atoms, as characters), and atomic coordinates (coord, in XYZ coordinates).
In the EPS task, the TARGET represents continuous values (relative dielectric constants).

代码

文本

At this point, some of you might be wondering, what exactly is a pkl file? In the previous BBBP scenario, our data files were the commonly used CSV format, which can be easily visualized with EXCEL. Pickle (often with the .pkl file extension) and CSV are two commonly used data storage formats, each with its own advantages and suitable use cases.

If you're unfamiliar with the pkl file format, let’s explore together why we use pkl for molecular coordinate data packaging!

Pickle is a binary serialization format unique to Python that can conveniently store almost any type of Python object, including custom classes, functions, modules, and more. This means that you can save complex data structures (such as lists, dictionaries, sets, numpy arrays, etc.) directly into a Pickle file and reload them when needed without additional processing. Therefore, Pickle is especially suitable for storing complex objects like machine learning models. In the case of molecular data, where atom types (N), molecular coordinates (3N), and corresponding predictions (1) create a typical "many-to-one" data structure, Pickle can better package and manage these data.
CSV, on the other hand, is a simple text format primarily used to store tabular data. Each line in a CSV file corresponds to a row in the table, and each field is separated by commas. Since CSV is a plain text format, it has excellent compatibility and readability and can be read by almost any data processing software or programming language. However, CSV can only store two-dimensional tabular data and cannot directly store more complex structures. For tasks where a SMILES string corresponds to a single prediction value, CSV is easier to edit and visualize, and it works well for storing such "one-to-one" data structures.
Therefore, for cases requiring the storage of complex data structures, Pickle is a better choice. For simpler two-dimensional tabular data, which might need to be manually reviewed or shared across different software or programming languages, CSV is the better option.

代码

文本

[1]

!wget https://dp-public.oss-cn-beijing.aliyuncs.com/community/courses/eps_data_test.pkl

!wget https://dp-public.oss-cn-beijing.aliyuncs.com/community/courses/eps_data_train.pkl

--2024-09-17 14:09:58--  https://dp-public.oss-cn-beijing.aliyuncs.com/community/courses/eps_data_test.pkl
Resolving ga.dp.tech (ga.dp.tech)... 10.255.254.37, 10.255.254.18, 10.255.254.7
Connecting to ga.dp.tech (ga.dp.tech)|10.255.254.37|:8118... connected.
Proxy request sent, awaiting response... 200 OK
Length: 36644 (36K) [application/octet-stream]
Saving to: ‘eps_data_test.pkl’

eps_data_test.pkl   100%[===================>]  35.79K  --.-KB/s    in 0.003s  

2024-09-17 14:09:58 (10.4 MB/s) - ‘eps_data_test.pkl’ saved [36644/36644]

--2024-09-17 14:09:59--  https://dp-public.oss-cn-beijing.aliyuncs.com/community/courses/eps_data_train.pkl
Resolving ga.dp.tech (ga.dp.tech)... 10.255.254.18, 10.255.254.7, 10.255.254.37
Connecting to ga.dp.tech (ga.dp.tech)|10.255.254.18|:8118... connected.
Proxy request sent, awaiting response... 200 OK
Length: 173342 (169K) [application/octet-stream]
Saving to: ‘eps_data_train.pkl’

eps_data_train.pkl  100%[===================>] 169.28K  --.-KB/s    in 0.03s   

2024-09-17 14:10:00 (6.22 MB/s) - ‘eps_data_train.pkl’ saved [173342/173342]

代码

文本

[2]

import pickle # 导入pickle来undump文件

f = open('eps_data_train.pkl', 'rb') # 'rb' for reading binary; can be omitted

eps_train = pickle.load(f) # 以字典“dict”载入pkl文件

f.close()

print(eps_train.keys())

# print(epsdata["target"])

# print(epsdata["atoms"])

# print(epsdata["coord"])

dict_keys(['target', 'atoms', 'coord'])

代码

文本

Step2: import Uni-Mol

代码

文本

[3]

from unimol import MolTrain, MolPredict

import numpy as np

/opt/conda/lib/python3.8/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

代码

文本

Step3: input data and traning

Note that the data type and format need to be converted into a format readable by custom_data, such as eps_train["target"].

代码

文本

[4]

print(eps_train["target"].shape)

print(eps_train["atoms"].shape)

print(eps_train["coord"].shape)

(500,)
(500,)
(500,)

代码

文本

[5]

custom_data ={'target':eps_train["target"],

'atoms':eps_train["atoms"].to_list(),

'coordinates':eps_train["coord"].to_list(),

'target_scaler':"none"}

clf = MolTrain(task='regression',

data_type='molecule',

epochs=10,

learning_rate=0.0001,

batch_size=16,

early_stopping=5,

metrics='mae',

split='random',

save_path='./eps_train',

)

clf.fit(custom_data)

2024-09-17 14:10:24 | unimol/data/datareader.py | 147 | INFO | Uni-Mol(QSAR) | Anomaly clean with 3 sigma threshold: 500 -> 488
2024-09-17 14:10:24 | unimol/train.py | 105 | INFO | Uni-Mol(QSAR) | Output directory already exists: ./eps_train
2024-09-17 14:10:24 | unimol/train.py | 106 | INFO | Uni-Mol(QSAR) | Warning: Overwrite output directory: ./eps_train
2024-09-17 14:10:25 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2024-09-17 14:10:25 | unimol/models/nnmodel.py | 103 | INFO | Uni-Mol(QSAR) | start training Uni-Mol:unimolv1
2024-09-17 14:10:37 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/10] train_loss: 0.8157, val_loss: 0.8609, val_mae: 7.5395, lr: 0.000093, 10.1s
2024-09-17 14:10:41 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/10] train_loss: 0.4906, val_loss: 0.4689, val_mae: 5.0604, lr: 0.000082, 3.1s
2024-09-17 14:10:45 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/10] train_loss: 0.3276, val_loss: 0.4049, val_mae: 6.5102, lr: 0.000072, 3.1s
2024-09-17 14:10:48 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/10] train_loss: 0.2424, val_loss: 0.1800, val_mae: 3.6547, lr: 0.000062, 3.1s
2024-09-17 14:10:52 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/10] train_loss: 0.1729, val_loss: 0.1090, val_mae: 2.7991, lr: 0.000052, 3.1s
2024-09-17 14:10:55 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/10] train_loss: 0.1399, val_loss: 0.0865, val_mae: 2.4446, lr: 0.000041, 3.0s
2024-09-17 14:10:59 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/10] train_loss: 0.1043, val_loss: 0.1115, val_mae: 2.8930, lr: 0.000031, 3.1s
2024-09-17 14:11:02 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/10] train_loss: 0.0893, val_loss: 0.0938, val_mae: 2.6660, lr: 0.000021, 3.1s
2024-09-17 14:11:05 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/10] train_loss: 0.0830, val_loss: 0.0986, val_mae: 3.1742, lr: 0.000010, 3.0s
2024-09-17 14:11:08 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/10] train_loss: 0.0679, val_loss: 0.0812, val_mae: 2.6569, lr: 0.000000, 3.0s
2024-09-17 14:11:09 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-09-17 14:11:09 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 0, result {'mae': 2.444621, 'pearsonr': 0.9725461719263702, 'spearmanr': 0.9123546363543582, 'mse': 13.380485, 'r2': 0.9409195791531426}
2024-09-17 14:11:10 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2024-09-17 14:11:13 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/10] train_loss: 1.1724, val_loss: 0.4726, val_mae: 6.1630, lr: 0.000093, 3.1s
2024-09-17 14:11:17 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/10] train_loss: 0.6891, val_loss: 0.2480, val_mae: 3.9029, lr: 0.000082, 3.1s
2024-09-17 14:11:20 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/10] train_loss: 0.4986, val_loss: 0.1703, val_mae: 3.6059, lr: 0.000072, 3.0s
2024-09-17 14:11:24 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/10] train_loss: 0.3181, val_loss: 0.3630, val_mae: 6.0811, lr: 0.000062, 3.0s
2024-09-17 14:11:27 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/10] train_loss: 0.2108, val_loss: 0.1008, val_mae: 3.0632, lr: 0.000052, 3.1s
2024-09-17 14:11:31 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/10] train_loss: 0.1855, val_loss: 0.1128, val_mae: 2.8973, lr: 0.000041, 3.1s
2024-09-17 14:11:35 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/10] train_loss: 0.1476, val_loss: 0.0534, val_mae: 1.9220, lr: 0.000031, 3.1s
2024-09-17 14:11:39 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/10] train_loss: 0.1106, val_loss: 0.0576, val_mae: 2.0288, lr: 0.000021, 3.1s
2024-09-17 14:11:42 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/10] train_loss: 0.1062, val_loss: 0.1186, val_mae: 2.9376, lr: 0.000010, 3.1s
2024-09-17 14:11:45 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/10] train_loss: 0.0949, val_loss: 0.0701, val_mae: 2.2602, lr: 0.000000, 3.1s
2024-09-17 14:11:45 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-09-17 14:11:46 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 1, result {'mae': 1.9219519, 'pearsonr': 0.967242943111378, 'spearmanr': 0.9183849126640824, 'mse': 8.067735, 'r2': 0.9317718623963656}
2024-09-17 14:11:46 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2024-09-17 14:11:50 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/10] train_loss: 0.8270, val_loss: 1.0078, val_mae: 6.3527, lr: 0.000093, 3.1s
2024-09-17 14:11:53 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/10] train_loss: 0.4959, val_loss: 0.7816, val_mae: 6.1435, lr: 0.000082, 3.1s
2024-09-17 14:11:57 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/10] train_loss: 0.2808, val_loss: 0.3553, val_mae: 4.0297, lr: 0.000072, 3.1s
2024-09-17 14:12:01 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/10] train_loss: 0.2008, val_loss: 0.3047, val_mae: 3.5471, lr: 0.000062, 3.0s
2024-09-17 14:12:05 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/10] train_loss: 0.1433, val_loss: 0.2868, val_mae: 3.1682, lr: 0.000052, 3.1s
2024-09-17 14:12:08 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/10] train_loss: 0.1394, val_loss: 0.2364, val_mae: 3.4910, lr: 0.000041, 3.1s
2024-09-17 14:12:11 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/10] train_loss: 0.0925, val_loss: 0.2073, val_mae: 2.6632, lr: 0.000031, 3.1s
2024-09-17 14:12:15 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/10] train_loss: 0.0817, val_loss: 0.4173, val_mae: 4.1615, lr: 0.000021, 3.1s
2024-09-17 14:12:18 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/10] train_loss: 0.0697, val_loss: 0.3445, val_mae: 3.1116, lr: 0.000010, 3.1s
2024-09-17 14:12:21 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/10] train_loss: 0.0690, val_loss: 0.2819, val_mae: 3.1856, lr: 0.000000, 3.0s
2024-09-17 14:12:22 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-09-17 14:12:22 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 2, result {'mae': 2.6632354, 'pearsonr': 0.9128733589905662, 'spearmanr': 0.9502256962347568, 'mse': 31.136843, 'r2': 0.8329210280157063}
2024-09-17 14:12:23 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2024-09-17 14:12:26 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/10] train_loss: 1.0041, val_loss: 0.8644, val_mae: 9.8966, lr: 0.000093, 3.1s
2024-09-17 14:12:30 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/10] train_loss: 0.9185, val_loss: 0.3185, val_mae: 4.4445, lr: 0.000082, 3.4s
2024-09-17 14:12:34 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/10] train_loss: 0.4108, val_loss: 0.4465, val_mae: 6.3175, lr: 0.000072, 3.1s
2024-09-17 14:12:37 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/10] train_loss: 0.2654, val_loss: 0.1721, val_mae: 3.6764, lr: 0.000062, 3.0s
2024-09-17 14:12:40 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/10] train_loss: 0.2126, val_loss: 0.1485, val_mae: 3.1739, lr: 0.000052, 3.1s
2024-09-17 14:12:44 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/10] train_loss: 0.1598, val_loss: 0.0699, val_mae: 2.5844, lr: 0.000041, 3.1s
2024-09-17 14:12:48 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/10] train_loss: 0.1467, val_loss: 0.0962, val_mae: 2.3810, lr: 0.000031, 3.0s
2024-09-17 14:12:52 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/10] train_loss: 0.1026, val_loss: 0.0743, val_mae: 2.8888, lr: 0.000021, 3.2s
2024-09-17 14:12:56 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/10] train_loss: 0.0900, val_loss: 0.0598, val_mae: 2.3403, lr: 0.000010, 3.8s
2024-09-17 14:13:00 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/10] train_loss: 0.0883, val_loss: 0.0591, val_mae: 2.5604, lr: 0.000000, 3.1s
2024-09-17 14:13:00 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-09-17 14:13:00 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 3, result {'mae': 2.34029, 'pearsonr': 0.9588346089298846, 'spearmanr': 0.9057723981255058, 'mse': 9.400122, 'r2': 0.8903235786705405}
2024-09-17 14:13:01 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2024-09-17 14:13:04 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/10] train_loss: 0.8800, val_loss: 0.4444, val_mae: 5.8270, lr: 0.000093, 3.1s
2024-09-17 14:13:08 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/10] train_loss: 0.5974, val_loss: 0.4910, val_mae: 7.1490, lr: 0.000082, 3.1s
2024-09-17 14:13:11 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/10] train_loss: 0.3656, val_loss: 0.2387, val_mae: 4.2860, lr: 0.000072, 3.0s
2024-09-17 14:13:15 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/10] train_loss: 0.2240, val_loss: 0.1328, val_mae: 3.3094, lr: 0.000062, 3.1s
2024-09-17 14:13:19 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/10] train_loss: 0.1599, val_loss: 0.2003, val_mae: 3.4868, lr: 0.000052, 3.0s
2024-09-17 14:13:22 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/10] train_loss: 0.1062, val_loss: 0.0813, val_mae: 2.2801, lr: 0.000041, 3.1s
2024-09-17 14:13:26 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/10] train_loss: 0.0913, val_loss: 0.1103, val_mae: 2.6971, lr: 0.000031, 3.1s
2024-09-17 14:13:28 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/10] train_loss: 0.0785, val_loss: 0.0794, val_mae: 1.9937, lr: 0.000021, 3.0s
2024-09-17 14:13:32 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/10] train_loss: 0.0603, val_loss: 0.0837, val_mae: 2.0807, lr: 0.000010, 3.0s
2024-09-17 14:13:36 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/10] train_loss: 0.0534, val_loss: 0.0895, val_mae: 2.3770, lr: 0.000000, 4.0s
2024-09-17 14:13:36 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-09-17 14:13:37 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 4, result {'mae': 1.9936786, 'pearsonr': 0.9519365032853556, 'spearmanr': 0.9179652058074022, 'mse': 13.714977, 'r2': 0.8882068969317306}
2024-09-17 14:13:37 | unimol/models/nnmodel.py | 144 | INFO | Uni-Mol(QSAR) | Uni-Mol metrics score: 
{'mae': 2.2731889115826993, 'pearsonr': 0.9496033854988929, 'spearmanr': 0.924318698428456, 'mse': 15.154713609003048, 'r2': 0.8988214902246847}
2024-09-17 14:13:37 | unimol/models/nnmodel.py | 145 | INFO | Uni-Mol(QSAR) | Uni-Mol & Metric result saved!

代码

文本

Step 4: Fine-tuning Parameters (Omitted)

We have already covered how to tune parameters in previous cases, so we will not go into detail here. For more information, refer to the BBBP case:
As an exercise, please try adding a code block to fine-tune the hyperparameters.

代码

文本

Step5: Read in Molecular Conformations for EPS Prediction

The data type of the test set is consistent with the training set, requiring an atomic type sequence + atomic coordinate sequence.

代码

文本

[6]

import numpy as np

f = open('eps_data_test.pkl', 'rb') # 'rb' for reading binary; can be omitted

eps_test = pickle.load(f)

f.close()

custom_data = {

'atoms':eps_test["atoms"].to_list(),

'coordinates':eps_test["coord"].to_list(),

}

clf = MolPredict(load_model = './eps_train')

predict = clf.predict(custom_data)

2024-09-17 14:43:32 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2024-09-17 14:43:33 | unimol/models/nnmodel.py | 154 | INFO | Uni-Mol(QSAR) | start predict NNModel:unimolv1
2024-09-17 14:43:33 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-09-17 14:43:34 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-09-17 14:43:34 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-09-17 14:43:35 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
2024-09-17 14:43:36 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!

代码

文本

The experimental dielectric constant data of the test set is in the file https://dp-public.oss-cn-beijing.aliyuncs.com/community/courses/eps_test.csv

代码

文本

[7]

!wget https://dp-public.oss-cn-beijing.aliyuncs.com/community/courses/eps_test.csv

--2024-09-17 14:44:48--  https://dp-public.oss-cn-beijing.aliyuncs.com/community/courses/eps_test.csv
Resolving ga.dp.tech (ga.dp.tech)... 10.255.254.37, 10.255.254.7, 10.255.254.18
Connecting to ga.dp.tech (ga.dp.tech)|10.255.254.37|:8118... connected.
Proxy request sent, awaiting response... 200 OK
Length: 2190 (2.1K) [text/csv]
Saving to: ‘eps_test.csv’

eps_test.csv        100%[===================>]   2.14K  --.-KB/s    in 0s      

2024-09-17 14:44:48 (328 MB/s) - ‘eps_test.csv’ saved [2190/2190]

代码

文本

[8]

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

test_set = pd.read_csv("./eps_test.csv",header='infer')

test_eps = test_set["eps"].to_numpy()

xmin = min(predict.flatten().min(), test_eps.min())

xmax = max(predict.flatten().max(), test_eps.max())

ymin = xmin

ymax = xmax

plt.figure(figsize=(8, 8))

plt.style.use('seaborn-darkgrid')

plt.xlim(xmin, xmax)

plt.ylim(ymin, ymax)

plt.xlabel('Predicted $\epsilon$', fontsize=14)

plt.ylabel('Experimental $\epsilon$', fontsize=14)

plt.title('Experimental vs Predicted $\epsilon$', fontsize=16)

plt.scatter(predict.flatten(),test_eps, color='blue', alpha=0.6)

x = np.linspace(*plt.xlim())

plt.plot(x, x, color='red', linestyle='--', linewidth=2)

plt.show()

代码

文本

It can be seen that when predicting the EPS values of molecules that the prediction model has never seen before, the majority of the data have good prediction results. Among them, there are 3 outliers (outliers, whose feature values are significantly different from those in the training data), indicating that our model has poor predictive ability for these 3 molecules. If the test data contains abnormal structures, the model may make poor predictions for these points. In such cases, we may need to further clean and preprocess the test data.

代码

文本

Uni-Mol

English

electrolyte

dielectric constant

Uni-MolEnglishelectrolytedielectric constant

点个赞吧