Uni-Mol Property Prediction Practice - Regression Task - Dielectric Constant of Electrolyte Molecule
©️ Copyright 2023 @ Authors
Authors:
Wentao Guo 📨
Hongshuai Wang 📨
Date:2023-06-06
License:This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Quick start:click Start Connect button above, choose unimol-qsar:0612 images and any GPU node configration, and wait a moment to run
AIMS:
- Practical application of Uni-Mol in specific scenarios
- Understanding the working modules of UniMol
- Model training and prediction of physicochemical properties with molecular coordinates as input
Case Backgrounds
The dielectric constant (also called the relative permittivity) is a physical quantity used to describe a material's ability to polarize in response to an electric field. It is a dimensionless value that helps us understand the extent to which a medium responds to an electric field. The dielectric constant is also referred to as the static relative permittivity and is represented by the symbol . In formulas, the absolute dielectric constant (measured in farads per meter in the International System of Units) equals the product of the static relative permittivity and the dielectric constant in a vacuum (approximately ).
The dielectric constant of electrolyte molecules is a physical quantity that measures how molecules in an electrolyte solution respond to an electric field. It significantly affects the properties of the electrolyte solution and various electrochemical processes, influencing ion solubility, ion mobility, the conductivity of the electrolyte solution, activation energy of electrolysis reactions, and the stability of coordination reactions. Different applications require selecting electrolytes with appropriate dielectric constants to meet specific performance needs.
In this case, we will use Uni-Mol to predict the dielectric constants of molecules, aiming to:
- Learn a training method that uses molecular coordinates instead of SMILE strings as input.
- Apply regression models to predict continuous values.
- Use the trained model to predict the dielectric constants of certain molecules.
Step 1: Load the Data
The dataset includes a pkl file containing EPS and coordinate information for 500 molecules:
https://dp-public.oss-cn-beijing.aliyuncs.com/community/courses/eps_data_test.pkl
https://dp-public.oss-cn-beijing.aliyuncs.com/community/courses/eps_data_train.pklThe input file needs to contain the TARGET, atom types (atoms, as characters), and atomic coordinates (coord, in XYZ coordinates).
In the EPS task, the TARGET represents continuous values (relative dielectric constants).
At this point, some of you might be wondering, what exactly is a pkl file? In the previous BBBP scenario, our data files were the commonly used CSV format, which can be easily visualized with EXCEL. Pickle (often with the .pkl file extension) and CSV are two commonly used data storage formats, each with its own advantages and suitable use cases.
If you're unfamiliar with the pkl file format, let’s explore together why we use pkl for molecular coordinate data packaging!
Pickle is a binary serialization format unique to Python that can conveniently store almost any type of Python object, including custom classes, functions, modules, and more. This means that you can save complex data structures (such as lists, dictionaries, sets, numpy arrays, etc.) directly into a Pickle file and reload them when needed without additional processing. Therefore, Pickle is especially suitable for storing complex objects like machine learning models. In the case of molecular data, where atom types (N), molecular coordinates (3N), and corresponding predictions (1) create a typical "many-to-one" data structure, Pickle can better package and manage these data.
CSV, on the other hand, is a simple text format primarily used to store tabular data. Each line in a CSV file corresponds to a row in the table, and each field is separated by commas. Since CSV is a plain text format, it has excellent compatibility and readability and can be read by almost any data processing software or programming language. However, CSV can only store two-dimensional tabular data and cannot directly store more complex structures. For tasks where a SMILES string corresponds to a single prediction value, CSV is easier to edit and visualize, and it works well for storing such "one-to-one" data structures.
Therefore, for cases requiring the storage of complex data structures, Pickle is a better choice. For simpler two-dimensional tabular data, which might need to be manually reviewed or shared across different software or programming languages, CSV is the better option.
--2024-09-17 14:09:58-- https://dp-public.oss-cn-beijing.aliyuncs.com/community/courses/eps_data_test.pkl Resolving ga.dp.tech (ga.dp.tech)... 10.255.254.37, 10.255.254.18, 10.255.254.7 Connecting to ga.dp.tech (ga.dp.tech)|10.255.254.37|:8118... connected. Proxy request sent, awaiting response... 200 OK Length: 36644 (36K) [application/octet-stream] Saving to: ‘eps_data_test.pkl’ eps_data_test.pkl 100%[===================>] 35.79K --.-KB/s in 0.003s 2024-09-17 14:09:58 (10.4 MB/s) - ‘eps_data_test.pkl’ saved [36644/36644] --2024-09-17 14:09:59-- https://dp-public.oss-cn-beijing.aliyuncs.com/community/courses/eps_data_train.pkl Resolving ga.dp.tech (ga.dp.tech)... 10.255.254.18, 10.255.254.7, 10.255.254.37 Connecting to ga.dp.tech (ga.dp.tech)|10.255.254.18|:8118... connected. Proxy request sent, awaiting response... 200 OK Length: 173342 (169K) [application/octet-stream] Saving to: ‘eps_data_train.pkl’ eps_data_train.pkl 100%[===================>] 169.28K --.-KB/s in 0.03s 2024-09-17 14:10:00 (6.22 MB/s) - ‘eps_data_train.pkl’ saved [173342/173342]
dict_keys(['target', 'atoms', 'coord'])
Step2: import Uni-Mol
/opt/conda/lib/python3.8/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
Step3: input data and traning
- Note that the data type and format need to be converted into a format readable by custom_data, such as
eps_train["target"]
.
(500,) (500,) (500,)
2024-09-17 14:10:24 | unimol/data/datareader.py | 147 | INFO | Uni-Mol(QSAR) | Anomaly clean with 3 sigma threshold: 500 -> 488 2024-09-17 14:10:24 | unimol/train.py | 105 | INFO | Uni-Mol(QSAR) | Output directory already exists: ./eps_train 2024-09-17 14:10:24 | unimol/train.py | 106 | INFO | Uni-Mol(QSAR) | Warning: Overwrite output directory: ./eps_train 2024-09-17 14:10:25 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2024-09-17 14:10:25 | unimol/models/nnmodel.py | 103 | INFO | Uni-Mol(QSAR) | start training Uni-Mol:unimolv1 2024-09-17 14:10:37 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/10] train_loss: 0.8157, val_loss: 0.8609, val_mae: 7.5395, lr: 0.000093, 10.1s 2024-09-17 14:10:41 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/10] train_loss: 0.4906, val_loss: 0.4689, val_mae: 5.0604, lr: 0.000082, 3.1s 2024-09-17 14:10:45 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/10] train_loss: 0.3276, val_loss: 0.4049, val_mae: 6.5102, lr: 0.000072, 3.1s 2024-09-17 14:10:48 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/10] train_loss: 0.2424, val_loss: 0.1800, val_mae: 3.6547, lr: 0.000062, 3.1s 2024-09-17 14:10:52 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/10] train_loss: 0.1729, val_loss: 0.1090, val_mae: 2.7991, lr: 0.000052, 3.1s 2024-09-17 14:10:55 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/10] train_loss: 0.1399, val_loss: 0.0865, val_mae: 2.4446, lr: 0.000041, 3.0s 2024-09-17 14:10:59 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/10] train_loss: 0.1043, val_loss: 0.1115, val_mae: 2.8930, lr: 0.000031, 3.1s 2024-09-17 14:11:02 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/10] train_loss: 0.0893, val_loss: 0.0938, val_mae: 2.6660, lr: 0.000021, 3.1s 2024-09-17 14:11:05 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/10] train_loss: 0.0830, val_loss: 0.0986, val_mae: 3.1742, lr: 0.000010, 3.0s 2024-09-17 14:11:08 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/10] train_loss: 0.0679, val_loss: 0.0812, val_mae: 2.6569, lr: 0.000000, 3.0s 2024-09-17 14:11:09 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-09-17 14:11:09 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 0, result {'mae': 2.444621, 'pearsonr': 0.9725461719263702, 'spearmanr': 0.9123546363543582, 'mse': 13.380485, 'r2': 0.9409195791531426} 2024-09-17 14:11:10 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2024-09-17 14:11:13 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/10] train_loss: 1.1724, val_loss: 0.4726, val_mae: 6.1630, lr: 0.000093, 3.1s 2024-09-17 14:11:17 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/10] train_loss: 0.6891, val_loss: 0.2480, val_mae: 3.9029, lr: 0.000082, 3.1s 2024-09-17 14:11:20 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/10] train_loss: 0.4986, val_loss: 0.1703, val_mae: 3.6059, lr: 0.000072, 3.0s 2024-09-17 14:11:24 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/10] train_loss: 0.3181, val_loss: 0.3630, val_mae: 6.0811, lr: 0.000062, 3.0s 2024-09-17 14:11:27 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/10] train_loss: 0.2108, val_loss: 0.1008, val_mae: 3.0632, lr: 0.000052, 3.1s 2024-09-17 14:11:31 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/10] train_loss: 0.1855, val_loss: 0.1128, val_mae: 2.8973, lr: 0.000041, 3.1s 2024-09-17 14:11:35 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/10] train_loss: 0.1476, val_loss: 0.0534, val_mae: 1.9220, lr: 0.000031, 3.1s 2024-09-17 14:11:39 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/10] train_loss: 0.1106, val_loss: 0.0576, val_mae: 2.0288, lr: 0.000021, 3.1s 2024-09-17 14:11:42 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/10] train_loss: 0.1062, val_loss: 0.1186, val_mae: 2.9376, lr: 0.000010, 3.1s 2024-09-17 14:11:45 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/10] train_loss: 0.0949, val_loss: 0.0701, val_mae: 2.2602, lr: 0.000000, 3.1s 2024-09-17 14:11:45 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-09-17 14:11:46 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 1, result {'mae': 1.9219519, 'pearsonr': 0.967242943111378, 'spearmanr': 0.9183849126640824, 'mse': 8.067735, 'r2': 0.9317718623963656} 2024-09-17 14:11:46 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2024-09-17 14:11:50 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/10] train_loss: 0.8270, val_loss: 1.0078, val_mae: 6.3527, lr: 0.000093, 3.1s 2024-09-17 14:11:53 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/10] train_loss: 0.4959, val_loss: 0.7816, val_mae: 6.1435, lr: 0.000082, 3.1s 2024-09-17 14:11:57 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/10] train_loss: 0.2808, val_loss: 0.3553, val_mae: 4.0297, lr: 0.000072, 3.1s 2024-09-17 14:12:01 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/10] train_loss: 0.2008, val_loss: 0.3047, val_mae: 3.5471, lr: 0.000062, 3.0s 2024-09-17 14:12:05 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/10] train_loss: 0.1433, val_loss: 0.2868, val_mae: 3.1682, lr: 0.000052, 3.1s 2024-09-17 14:12:08 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/10] train_loss: 0.1394, val_loss: 0.2364, val_mae: 3.4910, lr: 0.000041, 3.1s 2024-09-17 14:12:11 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/10] train_loss: 0.0925, val_loss: 0.2073, val_mae: 2.6632, lr: 0.000031, 3.1s 2024-09-17 14:12:15 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/10] train_loss: 0.0817, val_loss: 0.4173, val_mae: 4.1615, lr: 0.000021, 3.1s 2024-09-17 14:12:18 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/10] train_loss: 0.0697, val_loss: 0.3445, val_mae: 3.1116, lr: 0.000010, 3.1s 2024-09-17 14:12:21 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/10] train_loss: 0.0690, val_loss: 0.2819, val_mae: 3.1856, lr: 0.000000, 3.0s 2024-09-17 14:12:22 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-09-17 14:12:22 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 2, result {'mae': 2.6632354, 'pearsonr': 0.9128733589905662, 'spearmanr': 0.9502256962347568, 'mse': 31.136843, 'r2': 0.8329210280157063} 2024-09-17 14:12:23 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2024-09-17 14:12:26 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/10] train_loss: 1.0041, val_loss: 0.8644, val_mae: 9.8966, lr: 0.000093, 3.1s 2024-09-17 14:12:30 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/10] train_loss: 0.9185, val_loss: 0.3185, val_mae: 4.4445, lr: 0.000082, 3.4s 2024-09-17 14:12:34 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/10] train_loss: 0.4108, val_loss: 0.4465, val_mae: 6.3175, lr: 0.000072, 3.1s 2024-09-17 14:12:37 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/10] train_loss: 0.2654, val_loss: 0.1721, val_mae: 3.6764, lr: 0.000062, 3.0s 2024-09-17 14:12:40 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/10] train_loss: 0.2126, val_loss: 0.1485, val_mae: 3.1739, lr: 0.000052, 3.1s 2024-09-17 14:12:44 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/10] train_loss: 0.1598, val_loss: 0.0699, val_mae: 2.5844, lr: 0.000041, 3.1s 2024-09-17 14:12:48 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/10] train_loss: 0.1467, val_loss: 0.0962, val_mae: 2.3810, lr: 0.000031, 3.0s 2024-09-17 14:12:52 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/10] train_loss: 0.1026, val_loss: 0.0743, val_mae: 2.8888, lr: 0.000021, 3.2s 2024-09-17 14:12:56 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/10] train_loss: 0.0900, val_loss: 0.0598, val_mae: 2.3403, lr: 0.000010, 3.8s 2024-09-17 14:13:00 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/10] train_loss: 0.0883, val_loss: 0.0591, val_mae: 2.5604, lr: 0.000000, 3.1s 2024-09-17 14:13:00 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-09-17 14:13:00 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 3, result {'mae': 2.34029, 'pearsonr': 0.9588346089298846, 'spearmanr': 0.9057723981255058, 'mse': 9.400122, 'r2': 0.8903235786705405} 2024-09-17 14:13:01 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2024-09-17 14:13:04 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/10] train_loss: 0.8800, val_loss: 0.4444, val_mae: 5.8270, lr: 0.000093, 3.1s 2024-09-17 14:13:08 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/10] train_loss: 0.5974, val_loss: 0.4910, val_mae: 7.1490, lr: 0.000082, 3.1s 2024-09-17 14:13:11 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/10] train_loss: 0.3656, val_loss: 0.2387, val_mae: 4.2860, lr: 0.000072, 3.0s 2024-09-17 14:13:15 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/10] train_loss: 0.2240, val_loss: 0.1328, val_mae: 3.3094, lr: 0.000062, 3.1s 2024-09-17 14:13:19 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/10] train_loss: 0.1599, val_loss: 0.2003, val_mae: 3.4868, lr: 0.000052, 3.0s 2024-09-17 14:13:22 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/10] train_loss: 0.1062, val_loss: 0.0813, val_mae: 2.2801, lr: 0.000041, 3.1s 2024-09-17 14:13:26 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/10] train_loss: 0.0913, val_loss: 0.1103, val_mae: 2.6971, lr: 0.000031, 3.1s 2024-09-17 14:13:28 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/10] train_loss: 0.0785, val_loss: 0.0794, val_mae: 1.9937, lr: 0.000021, 3.0s 2024-09-17 14:13:32 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/10] train_loss: 0.0603, val_loss: 0.0837, val_mae: 2.0807, lr: 0.000010, 3.0s 2024-09-17 14:13:36 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/10] train_loss: 0.0534, val_loss: 0.0895, val_mae: 2.3770, lr: 0.000000, 4.0s 2024-09-17 14:13:36 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-09-17 14:13:37 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 4, result {'mae': 1.9936786, 'pearsonr': 0.9519365032853556, 'spearmanr': 0.9179652058074022, 'mse': 13.714977, 'r2': 0.8882068969317306} 2024-09-17 14:13:37 | unimol/models/nnmodel.py | 144 | INFO | Uni-Mol(QSAR) | Uni-Mol metrics score: {'mae': 2.2731889115826993, 'pearsonr': 0.9496033854988929, 'spearmanr': 0.924318698428456, 'mse': 15.154713609003048, 'r2': 0.8988214902246847} 2024-09-17 14:13:37 | unimol/models/nnmodel.py | 145 | INFO | Uni-Mol(QSAR) | Uni-Mol & Metric result saved!
Step5: Read in Molecular Conformations for EPS Prediction
The data type of the test set is consistent with the training set, requiring an atomic type sequence + atomic coordinate sequence.
2024-09-17 14:43:32 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2024-09-17 14:43:33 | unimol/models/nnmodel.py | 154 | INFO | Uni-Mol(QSAR) | start predict NNModel:unimolv1 2024-09-17 14:43:33 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-09-17 14:43:34 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-09-17 14:43:34 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-09-17 14:43:35 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-09-17 14:43:36 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success!
The experimental dielectric constant data of the test set is in the file https://dp-public.oss-cn-beijing.aliyuncs.com/community/courses/eps_test.csv
--2024-09-17 14:44:48-- https://dp-public.oss-cn-beijing.aliyuncs.com/community/courses/eps_test.csv Resolving ga.dp.tech (ga.dp.tech)... 10.255.254.37, 10.255.254.7, 10.255.254.18 Connecting to ga.dp.tech (ga.dp.tech)|10.255.254.37|:8118... connected. Proxy request sent, awaiting response... 200 OK Length: 2190 (2.1K) [text/csv] Saving to: ‘eps_test.csv’ eps_test.csv 100%[===================>] 2.14K --.-KB/s in 0s 2024-09-17 14:44:48 (328 MB/s) - ‘eps_test.csv’ saved [2190/2190]
It can be seen that when predicting the EPS values of molecules that the prediction model has never seen before, the majority of the data have good prediction results. Among them, there are 3 outliers (outliers, whose feature values are significantly different from those in the training data), indicating that our model has poor predictive ability for these 3 molecules. If the test data contains abnormal structures, the model may make poor predictions for these points. In such cases, we may need to further clean and preprocess the test data.