Bohrium
robot
新建

空间站广场

论文
Notebooks
比赛
课程
Apps
我的主页
我的Notebooks
我的论文库
我的足迹

我的工作空间

任务
节点
文件
数据集
镜像
项目
数据库
公开
Quantitative Structure-Activity Relationship (QSAR) Model from 0 to 1 (Regression Task)
Deep Learning
RDKit
QSAR
Tutorial
Machine Learning
Uni-Mol
notebook
Scikit-Learn
Deep LearningRDKitQSARTutorialMachine LearningUni-MolnotebookScikit-Learn
Yani Guan
更新于 2024-10-24
推荐镜像 :Uni-Mol:unimol-qsar:v0.5
推荐机型 :c12_m92_1 * NVIDIA V100
Quantitative Structure-Activity Relationship (QSAR) Model from 0 to 1 & Uni-Mol Introductory Practice (Regression Task)
Table of Contents
Introduction
Let's Prepare Some Data!
A Brief History of QSAR
Basic Requirements for QSAR Modeling
Basic Workflow of QSAR Modeling
Molecular Representation
1D-QSAR Molecular Representation
2D-QSAR Molecular Characterization
3D-QSAR Molecular Characterization
Uni-Mol Molecular Representation Learning and Pretraining Framework
Pretraining Model
Introduction to Uni-Mol
Results Overview
One More Thing

Quantitative Structure-Activity Relationship (QSAR) Model from 0 to 1 & Uni-Mol Introductory Practice (Regression Task)

©️ Copyright 2023 @ Authors
Author: Hang Zheng 📨
Date: 2023-06-16
Sharing Agreement: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Quick Start: Click the Start Connection button above, select the unimol-qsar:v0.2 image and any GPU node configuration, and wait a moment to run.

代码
文本

In recent years, Artificial Intelligence (AI) has been developing at an unprecedented speed, bringing significant breakthroughs and transformations to various fields.

In fact, in the field of drug development, drug scientists have been using a series of mathematical and statistical methods to aid the drug development process since the last century. Based on the structure of drug molecules, they construct mathematical simulations to predict the biochemical activity of drugs. This method is known as Quantitative Structure-Activity Relationship (QSAR). QSAR models have continued to evolve with the deepening research on drug molecules and the introduction of more AI methods.

It can be said that QSAR models are a good microcosm of the development of the AI for Science field. In this Notebook, we will introduce the construction methods of different types of QSAR models in the form of case studies.

代码
文本

Introduction

Quantitative Structure-Activity Relationship (QSAR) is a method that studies the quantitative relationship between the chemical structure of compounds and their biological activity. It is one of the most important tools in Computer-Aided Drug Design (CADD). QSAR aims to establish mathematical models to relate molecular structures with their biochemical and physicochemical properties, helping drug scientists to make rational predictions about the properties of new drug molecules.

Building an effective QSAR model involves several steps:

  1. Constructing a reasonable molecular representation, which converts molecular structures into computer-readable numerical representations;
  2. Selecting a suitable machine learning model for the molecular representation and using existing molecule-property data to train the model;
  3. Using the trained machine learning model to predict the properties of molecules with unknown properties.

The development of QSAR models has evolved with the progression of molecular representation techniques and the corresponding upgrades in machine learning models. In this notebook, we will introduce the construction methods of different types of QSAR models through case studies.

代码
文本

Let's Prepare Some Data!

To better guide everyone through the process of building QSAR models, we will use the prediction of hERG protein inhibitory capability as a demonstration case.

We can start by downloading the hERG dataset:

代码
文本
[1]
import os
os.makedirs("datasets", exist_ok=True)

!pip install seaborn
!pip install lightgbm
!wget https://dp-public.oss-cn-beijing.aliyuncs.com/community/hERG.csv -O datasets/hERG.csv
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting seaborn
  Downloading seaborn-0.12.2-py3-none-any.whl (293 kB)
     |████████████████████████████████| 293 kB 338 kB/s eta 0:00:01
Requirement already satisfied: matplotlib!=3.6.1,>=3.1 in /opt/conda/lib/python3.8/site-packages (from seaborn) (3.7.1)
Requirement already satisfied: numpy!=1.24.0,>=1.17 in /opt/conda/lib/python3.8/site-packages (from seaborn) (1.20.3)
Requirement already satisfied: pandas>=0.25 in /opt/conda/lib/python3.8/site-packages (from seaborn) (1.5.3)
Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.8/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (0.11.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.8/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (1.4.4)
Requirement already satisfied: pillow>=6.2.0 in /opt/conda/lib/python3.8/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (9.5.0)
Requirement already satisfied: contourpy>=1.0.1 in /opt/conda/lib/python3.8/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (1.0.7)
Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/lib/python3.8/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (4.39.4)
Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.8/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (23.1)
Requirement already satisfied: python-dateutil>=2.7 in /opt/conda/lib/python3.8/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (2.8.2)
Requirement already satisfied: pyparsing>=2.3.1 in /opt/conda/lib/python3.8/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (3.0.9)
Requirement already satisfied: importlib-resources>=3.2.0 in /opt/conda/lib/python3.8/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (5.12.0)
Requirement already satisfied: zipp>=3.1.0 in /opt/conda/lib/python3.8/site-packages (from importlib-resources>=3.2.0->matplotlib!=3.6.1,>=3.1->seaborn) (3.15.0)
Requirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.8/site-packages (from pandas>=0.25->seaborn) (2023.3)
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib!=3.6.1,>=3.1->seaborn) (1.16.0)
Installing collected packages: seaborn
Successfully installed seaborn-0.12.2
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting lightgbm
  Downloading lightgbm-3.3.5-py3-none-manylinux1_x86_64.whl (2.0 MB)
     |████████████████████████████████| 2.0 MB 319 kB/s eta 0:00:01
Requirement already satisfied: scikit-learn!=0.22.0 in /opt/conda/lib/python3.8/site-packages (from lightgbm) (0.24.2)
Requirement already satisfied: wheel in /opt/conda/lib/python3.8/site-packages (from lightgbm) (0.40.0)
Requirement already satisfied: numpy in /opt/conda/lib/python3.8/site-packages (from lightgbm) (1.20.3)
Requirement already satisfied: scipy in /opt/conda/lib/python3.8/site-packages (from lightgbm) (1.6.3)
Requirement already satisfied: joblib>=0.11 in /opt/conda/lib/python3.8/site-packages (from scikit-learn!=0.22.0->lightgbm) (1.1.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.8/site-packages (from scikit-learn!=0.22.0->lightgbm) (3.1.0)
Installing collected packages: lightgbm
Successfully installed lightgbm-3.3.5
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
--2023-06-17 12:34:08--  https://dp-public.oss-cn-beijing.aliyuncs.com/community/hERG.csv
Resolving ga.dp.tech (ga.dp.tech)... 10.255.255.41
Connecting to ga.dp.tech (ga.dp.tech)|10.255.255.41|:8118... connected.
Proxy request sent, awaiting response... 200 OK
Length: 560684 (548K) [text/csv]
Saving to: ‘datasets/hERG.csv’

datasets/hERG.csv   100%[===================>] 547.54K  --.-KB/s    in 0.08s   

2023-06-17 12:34:09 (6.55 MB/s) - ‘datasets/hERG.csv’ saved [560684/560684]

代码
文本

Then, we can take a look at the composition of this dataset:

代码
文本
[8]
import pandas as pd
import numpy as np

data = pd.read_csv("./datasets/hERG.csv")
print("------------ Original data ------------")
print(data)
data.columns = ["SMILES", "TARGET"]

# Set 80% of the dataset as the training set and 20% as the test set
train_fraction = 0.8
train_data = data.sample(frac=train_fraction, random_state=1)
train_data.to_csv("./datasets/hERG_train.csv", index=False)
test_data = data.drop(train_data.index)
test_data.to_csv("./datasets/hERG_test.csv", index=False)

# Set training/test targets
train_y = np.array(train_data["TARGET"].values.tolist())
test_y = np.array(test_data["TARGET"].values.tolist())

# Create a results dictionary to store future results
results = {}

# Visualize the results
import matplotlib.pyplot as plt
import seaborn as sns
plt.figure(figsize=(6,4), dpi=150)
font = {'family': 'serif',
'color': 'black',
'weight': 'normal',
'size': 15}
plt.hist(train_data["TARGET"], bins=20, label="Train Data")
plt.hist(test_data["TARGET"], bins=20, label="Test Data")
plt.ylabel("Count", fontdict=font)
plt.xlabel("pIC50", fontdict=font)
plt.legend()
plt.show()

------------ Original data ------------
                                                 SMILES  pIC50
0     Cc1ccc(CN2[C@@H]3CC[C@H]2C[C@@H](C3)Oc4cccc(c4...   9.85
1     COc1nc2ccc(Br)cc2cc1[C@@H](c3ccccc3)[C@@](O)(C...   9.70
2     NC(=O)c1cccc(O[C@@H]2C[C@H]3CC[C@@H](C2)N3CCCc...   9.60
3                          CCCCCCCc1cccc([n+]1C)CCCCCCC   9.60
4     Cc1ccc(CN2[C@@H]3CC[C@H]2C[C@@H](C3)Oc4cccc(c4...   9.59
...                                                 ...    ...
9199  O=C1[C@H]2N(c3ccc(OCC=CCCNCC(=O)Nc4c(Cl)cc(cc4...   4.89
9200  O=C1[C@H]2N(c3ccc(OCCCCCNCC(=O)Nc4c(Cl)cc(cc4C...   4.89
9201  O=C1[C@H]2N(c3ccc(OCC=CCCCNCC(=O)Nc4c(Cl)cc(cc...   4.89
9202  O=C1[C@H]2N(c3ccc(OCCCCCCNCC(=O)Nc4c(Cl)cc(cc4...   4.49
9203  O=C1N=C/C(=C2\N(c3c(cc(Cl)c(Cl)c3)N\2)Cc4cc(Cl...   5.30

[9204 rows x 2 columns]
<Figure size 900x600 with 1 Axes>
代码
文本

You can see that in the hERG dataset:

  • Molecules are represented by SMILES strings;
  • The task objective is a regression prediction task, predicting the inhibitory activity of molecules on proteins, represented by pIC50.

This is a common molecular property prediction task. Alright, let's put this dataset aside for now. Next, let's officially start exploring.

代码
文本

A Brief History of QSAR

Quantitative Structure-Activity Relationship (QSAR) is a method that studies the quantitative relationship between the chemical structure of compounds and their biological activity. It is one of the most important tools in Computer-Aided Drug Design (CADD). QSAR aims to establish mathematical models to relate molecular structures with their biochemical and physicochemical properties, helping drug scientists to make rational predictions about the properties of new drug molecules.

QSAR evolved from Structure-Activity Relationship (SAR) analysis. The origins of SAR can be traced back to the late 19th century when chemists began studying the relationship between compound structures and biological activity. German chemist Paul Ehrlich (1854-1915) proposed the "lock-and-key" hypothesis, suggesting that the interaction between compounds (keys) and biological targets (locks) depends on their spatial matching. As scientists deepened their understanding of molecular interactions, they realized that besides spatial matching, the properties of the target surface (e.g., hydrophobicity, electrophilicity) and the corresponding properties of the ligand structure were also crucial. This led to the development of a series of methods to evaluate the structural characteristics and binding affinity, known as Structure-Activity Relationships.

However, the SAR method mainly relied on the experience and intuitive judgment of chemists, lacking a rigorous theoretical foundation and unified analytical approach. To overcome these limitations, scientists began using mathematical and statistical methods in the 1960s to conduct quantitative analysis of the relationship between molecular structure and biological activity.

The earliest proposed QSAR model can be traced back to 1868, when chemist Alexander Crum Brown and physiologist Thomas R. Fraser began studying the relationship between compound structure and biological activity. In their research on the biological effects before and after methylation of the basic nitrogen atoms in alkaloids, they proposed that the physiological activity of a compound depends on the composition of its components, expressed as biological activity being a function of the compound composition : . This is known as the Crum-Brown Equation, laying the foundation for future QSAR research.

Subsequently, various QSAR models were proposed in academia, such as the QSAR model linking organic compound toxicity to molecular electronics introduced by Hammett, and the steric parameter model proposed by Taft. In 1964, Hansch and Fujita introduced the well-known Hansch model, which suggested that a molecule's biological activity is mainly determined by its hydrophobic effect (), steric effect (), and electronic effect (), and assumed that these three effects can be independently additive. The complete form of the model is: . The Hansch model was the first to quantitatively describe the relationship between chemical information and drug biological activity, providing a practical theoretical framework for subsequent QSAR research. It is considered a crucial milestone in the transition from blind drug design to rational drug design.

Today, QSAR has developed into a mature research field involving various computational methods and techniques. In recent years, with the rapid development of machine learning and artificial intelligence technologies, QSAR methods have been further expanded and applied. For example, deep learning techniques have been used to build QSAR models, enhancing their predictive capabilities and accuracy. Furthermore, QSAR methods have found broad applications in fields such as environmental science and materials science, demonstrating strong potential and a wide range of application prospects.

代码
文本

Basic Requirements for QSAR Modeling

At an international conference held in Setubal, Portugal, in 2002, scientists proposed several rules regarding the validity of QSAR models, known as the "Setubal Principles." These rules were further refined in November 2004 and officially named the "OECD Principles." For a QSAR model to be used for regulatory purposes, it should meet the following 5 conditions:

  1. A defined endpoint
  2. An unambiguous algorithm
  3. A defined domain of applicability
  4. Appropriate measures of goodness-of-fit, robustness, and predictivity
  5. A mechanistic interpretation, if possible
代码
文本

Basic Workflow of QSAR Modeling

Building an effective QSAR model mainly involves three steps:

  1. Constructing a reasonable molecular representation, which converts molecular structures into computer-readable numerical representations;
  2. Selecting a suitable machine learning model for the molecular representation and using existing molecule-property data to train the model;
  3. Using the trained machine learning model to predict the properties of molecules with unknown properties.

Since molecular structures are not in a computer-readable format, we must first convert them into numerical vectors that can be read by computers. This allows for the selection of appropriate mathematical models based on these representations. We call this process molecular representation. Effective molecular representation and the choice of compatible mathematical models are the core of building quantitative structure-activity relationship models.

代码
文本

Molecular Representation

Molecular representation is a numerical depiction that includes molecular properties. Common molecular representation methods include molecular descriptors, fingerprints, SMILES strings, and molecular potential functions.

Wei, J., Chu, X., Sun, X. Y., Xu, K., Deng, H. X., Chen, J., ... & Lei, M. (2019). Machine learning in materials science. InfoMat, 1(3), 338-358.

In fact, the development of QSAR has evolved along with the increasing information content and changing forms of molecular representations, leading to the classification of QSAR models into 1D-QSAR, 2D-QSAR, and 3D-QSAR:

Different molecular representations have distinct numerical characteristics, requiring different machine learning/deep learning models for modeling. Next, we will demonstrate how to build 1D-QSAR, 2D-QSAR, and 3D-QSAR models through practical examples.

代码
文本

1D-QSAR Molecular Representation

Early quantitative structure-activity relationship models mostly used physicochemical properties of molecules, such as molecular weight, water solubility, and molecular surface area, as the method of representation. These physicochemical properties are known as molecular descriptors. This defines the 1D-QSAR stage.

At this stage, experienced scientists often rely on their domain knowledge to design molecular descriptors, constructing properties that may be related to the characteristic being studied. For example, if the goal is to predict whether a drug can pass through the blood-brain barrier, this property may be related to the drug's water solubility, molecular weight, polar surface area, and other physicochemical attributes. Scientists would include such attributes in the molecular descriptors.

During this period, due to limited access to computers or insufficient computational power, scientists often used simple mathematical models for modeling, such as linear regression and random forests. Since molecular representations constructed from descriptors are typically low-dimensional real-valued vectors, these mathematical models are well-suited for this kind of work.

代码
文本
[11]
from rdkit import Chem
from rdkit.Chem import Descriptors

def calculate_1dqsar_repr(smiles):
# Create a molecule object from the SMILES string
mol = Chem.MolFromSmiles(smiles)
# Calculate the molecular weight
mol_weight = Descriptors.MolWt(mol)
# Calculate the LogP value of the molecule
log_p = Descriptors.MolLogP(mol)
# Calculate the number of hydrogen bond donors in the molecule
num_h_donors = Descriptors.NumHDonors(mol)
# Calculate the number of hydrogen bond acceptors in the molecule
num_h_acceptors = Descriptors.NumHAcceptors(mol)
# Calculate the topological polar surface area (TPSA) of the molecule
tpsa = Descriptors.TPSA(mol)
# Calculate the number of rotatable bonds in the molecule
num_rotatable_bonds = Descriptors.NumRotatableBonds(mol)
# Calculate the number of aromatic rings in the molecule
num_aromatic_rings = Descriptors.NumAromaticRings(mol)
# Calculate the number of aliphatic rings in the molecule
num_aliphatic_rings = Descriptors.NumAliphaticRings(mol)
# Calculate the number of saturated rings in the molecule
num_saturated_rings = Descriptors.NumSaturatedRings(mol)
# Calculate the number of heteroatoms in the molecule
num_heteroatoms = Descriptors.NumHeteroatoms(mol)
# Calculate the number of valence electrons in the molecule
num_valence_electrons = Descriptors.NumValenceElectrons(mol)
# Calculate the number of radical electrons in the molecule
num_radical_electrons = Descriptors.NumRadicalElectrons(mol)
# Calculate the QED (quantitative estimation of drug-likeness) value of the molecule
qed = Descriptors.qed(mol)
# Return all calculated properties
return [mol_weight, log_p, num_h_donors, num_h_acceptors, tpsa, num_rotatable_bonds, num_aromatic_rings,
num_aliphatic_rings, num_saturated_rings, num_heteroatoms, num_valence_electrons, num_radical_electrons, qed]

# Apply the function to calculate 1D-QSAR molecular representation for training and testing data
train_data["1dqsar_mr"] = train_data["SMILES"].apply(calculate_1dqsar_repr)
test_data["1dqsar_mr"] = test_data["SMILES"].apply(calculate_1dqsar_repr)

代码
文本
[12]
print(train_data["1dqsar_mr"][:1].values.tolist())
[[464.87300000000016, 2.5531800000000002, 1, 10, 140.73000000000002, 8, 4, 0, 0, 12, 166, 0, 0.4159359067517256]]
代码
文本
[13]
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.neural_network import MLPRegressor
from xgboost import XGBRegressor
from sklearn.linear_model import Ridge, Lasso, ElasticNet
from sklearn.svm import SVR
from sklearn.ensemble import GradientBoostingRegressor
from lightgbm import LGBMRegressor
from sklearn.metrics import mean_squared_error

# Convert training and testing data to NumPy arrays
train_x = np.array(train_data["1dqsar_mr"].values.tolist())
train_y = np.array(train_data["TARGET"].values.tolist())
test_x = np.array(test_data["1dqsar_mr"].values.tolist())
test_y = np.array(test_data["TARGET"].values.tolist())

# Define the list of regressors to use
regressors = [
("Linear Regression", LinearRegression()), # Linear regression model
("Ridge Regression", Ridge(random_state=42)), # Ridge regression model
("Lasso Regression", Lasso(random_state=42)), # Lasso regression model
("ElasticNet Regression", ElasticNet(random_state=42)), # ElasticNet regression model
("Support Vector", SVR()), # Support vector regression model
("K-Nearest Neighbors", KNeighborsRegressor()), # K-nearest neighbors regression model
("Decision Tree", DecisionTreeRegressor(random_state=42)), # Decision tree regression model
("Random Forest", RandomForestRegressor(random_state=42)), # Random forest regression model
("Gradient Boosting", GradientBoostingRegressor(random_state=42)), # Gradient boosting regression model
("XGBoost", XGBRegressor(random_state=42)), # XGBoost regression model
("LightGBM", LGBMRegressor(random_state=42)), # LightGBM regression model
("Multi-layer Perceptron", MLPRegressor( # Multi-layer perceptron (neural network) regression model
hidden_layer_sizes=(128,64,32),
learning_rate_init=0.0001,
activation='relu', solver='adam',
max_iter=10000, random_state=42)),
]

# Train and predict for each regressor, and calculate performance metrics
for name, regressor in regressors:
# Train the regressor
regressor.fit(train_x, train_y)
# Predict training data and testing data
pred_train_y = regressor.predict(train_x)
pred_test_y = regressor.predict(test_x)
# Add predictions to training and testing data
train_data[f"1D-QSAR-{name}_pred"] = pred_train_y
test_data[f"1D-QSAR-{name}_pred"] = pred_test_y
# Calculate performance metrics for testing data
mse = mean_squared_error(test_y, pred_test_y)
se = abs(test_y - pred_test_y)
results[f"1D-QSAR-{name}"] = {"MSE": mse, "error": se}
print(f"[1D-QSAR][{name}]\tMSE:{mse:.4f}")

[1D-QSAR][Linear Regression]	MSE:0.8857
[1D-QSAR][Ridge Regression]	MSE:0.8857
[1D-QSAR][Lasso Regression]	MSE:0.9286
[1D-QSAR][ElasticNet Regression]	MSE:0.9269
[1D-QSAR][Support Vector]	MSE:0.9398
[1D-QSAR][K-Nearest Neighbors]	MSE:0.9110
[1D-QSAR][Decision Tree]	MSE:1.0579
[1D-QSAR][Random Forest]	MSE:0.6052
[1D-QSAR][Gradient Boosting]	MSE:0.7607
[1D-QSAR][XGBoost]	MSE:0.6057
[1D-QSAR][LightGBM]	MSE:0.6426
[1D-QSAR][Multi-layer Perceptron]	MSE:0.9385
代码
文本
[14]
import matplotlib.pyplot as plt
import seaborn as sns

# Plot residuals
residuals_data = []
for name, result in results.items():
if name.startswith("1D-QSAR"):
model_residuals = pd.DataFrame({"Model": name, "Error": result["error"]})
residuals_data.append(model_residuals)

residuals_df = pd.concat(residuals_data, ignore_index=True)
residuals_df.sort_values(by="Error", ascending=True, inplace=True)
model_order = residuals_df.groupby("Model")["Error"].median().sort_values(ascending=True).index

# Use seaborn to draw the violin plot
plt.figure(figsize=(10, 7), dpi=150)
font = {'family': 'serif',
'color': 'black',
'weight': 'normal',
'size': 15}
sns.boxplot(y="Model", x="Error", data=residuals_df, order=model_order)
plt.xlabel("Abs Error", fontdict=font)
plt.ylabel("Models", fontdict=font)
plt.show()

<Figure size 1500x1050 with 1 Axes>
代码
文本

2D-QSAR Molecular Characterization

However, when facing the challenge of predicting molecular properties with unclear biochemical mechanisms, scientists may find it difficult to design effective molecular descriptors to characterize molecules, leading to the failure of QSAR model construction. Since molecular properties are largely determined by molecular structure, such as the functional groups present on the molecule, there is an interest in incorporating the bonding relationships of molecules into QSAR modeling. Thus, the field has entered the stage of 2D-QSAR.

One of the earlier proposed methods is the molecular fingerprint method, such as Morgan fingerprints, which characterizes molecules by traversing the bonding relationships of each atom and its surrounding atoms. To meet the requirement that molecules of different sizes can be represented by numerical vectors of the same length, molecular fingerprints often use hashing operations to ensure uniform vector length, resulting in high-dimensional 0/1 vectors. In this scenario, scientists typically choose machine learning methods that handle high-dimensional sparse vectors well, such as support vector machines and fully connected neural networks, for model construction.

With the development of AI models, deep learning models capable of handling sequence data (e.g., text) like Recurrent Neural Networks (RNN), image data like Convolutional Neural Networks (CNN), and unstructured graph data like Graph Neural Networks (GNN) have been proposed and applied. QSAR models have also been constructed to fit molecular representations based on the data characteristics these models can handle. For example, SMILES string representations of molecules have been applied in RNN modeling, 2D images of molecules in CNN modeling, and the bonding topological structure of molecules converted into graphs in GNN modeling, leading to the development of a series of QSAR modeling methods.

Overall, in the 2D-QSAR stage, various methods are utilized to analyze the bonding relationships (topological structure) of molecules to model and predict molecular properties.

代码
文本
[15]
import numpy as np
from rdkit.Chem import AllChem

def calculate_2dqsar_repr(smiles):
# Convert the SMILES string to an RDKit molecule object
mol = Chem.MolFromSmiles(smiles)
# Calculate the Morgan fingerprint (radius 3, length 512 bits)
fp = AllChem.GetMorganFingerprintAsBitVect(mol, 3, nBits=512)
# Return the fingerprint as a numpy array
return np.array(fp)

# Apply the function to calculate 2D-QSAR molecular representation for training and testing data
train_data["2dqsar_mr"] = train_data["SMILES"].apply(calculate_2dqsar_repr)
test_data["2dqsar_mr"] = test_data["SMILES"].apply(calculate_2dqsar_repr)

代码
文本
[16]
print(train_data["2dqsar_mr"][:1].values.tolist())
[array([0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1,
       0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0,
       0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0,
       0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1,
       0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
       0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0,
       0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0,
       1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
       1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0,
       1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 1, 0, 0, 0, 1])]
代码
文本
[17]
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.neural_network import MLPRegressor
from xgboost import XGBRegressor
from sklearn.linear_model import Ridge, Lasso, ElasticNet
from sklearn.svm import SVR
from sklearn.ensemble import GradientBoostingRegressor
from lightgbm import LGBMRegressor
from sklearn.metrics import mean_squared_error

# Convert training and testing data to NumPy arrays
train_x = np.array(train_data["2dqsar_mr"].values.tolist())
train_y = np.array(train_data["TARGET"].values.tolist())
test_x = np.array(test_data["2dqsar_mr"].values.tolist())
test_y = np.array(test_data["TARGET"].values.tolist())

# Define the list of regressors to use
regressors = [
("Linear Regression", LinearRegression()), # Linear regression model
("Ridge Regression", Ridge(random_state=42)), # Ridge regression model
("Lasso Regression", Lasso(random_state=42)), # Lasso regression model
("ElasticNet Regression", ElasticNet(random_state=42)), # ElasticNet regression model
("Support Vector", SVR()), # Support vector regression model
("K-Nearest Neighbors", KNeighborsRegressor()), # K-nearest neighbors regression model
("Decision Tree", DecisionTreeRegressor(random_state=42)), # Decision tree regression model
("Random Forest", RandomForestRegressor(random_state=42)), # Random forest regression model
("Gradient Boosting", GradientBoostingRegressor(random_state=42)), # Gradient boosting regression model
("XGBoost", XGBRegressor(random_state=42)), # XGBoost regression model
("LightGBM", LGBMRegressor(random_state=42)), # LightGBM regression model
("Multi-layer Perceptron", MLPRegressor( # Multi-layer perceptron (neural network) regression model
hidden_layer_sizes=(128,64,32),
learning_rate_init=0.0001,
activation='relu', solver='adam',
max_iter=10000, random_state=42)),
]

# Train and predict for each regressor, and calculate performance metrics
for name, regressor in regressors:
# Train the regressor
regressor.fit(train_x, train_y)
# Predict training data and testing data
pred_train_y = regressor.predict(train_x)
pred_test_y = regressor.predict(test_x)
# Add predictions to training and testing data
train_data[f"2D-QSAR-{name}_pred"] = pred_train_y
test_data[f"2D-QSAR-{name}_pred"] = pred_test_y
# Calculate performance metrics for testing data
mse = mean_squared_error(test_y, pred_test_y)
se = abs(test_y - pred_test_y)
results[f"2D-QSAR-{name}"] = {"MSE": mse, "error": se}
print(f"[2D-QSAR][{name}]\tMSE:{mse:.4f}")

[2D-QSAR][Linear Regression]	MSE:0.7156
[2D-QSAR][Ridge Regression]	MSE:0.7154
[2D-QSAR][Lasso Regression]	MSE:1.0109
[2D-QSAR][ElasticNet Regression]	MSE:1.0109
[2D-QSAR][Support Vector]	MSE:0.4554
[2D-QSAR][K-Nearest Neighbors]	MSE:0.4806
[2D-QSAR][Decision Tree]	MSE:0.8892
[2D-QSAR][Random Forest]	MSE:0.4717
[2D-QSAR][Gradient Boosting]	MSE:0.6694
[2D-QSAR][XGBoost]	MSE:0.4591
[2D-QSAR][LightGBM]	MSE:0.4797
[2D-QSAR][Multi-layer Perceptron]	MSE:0.6933
代码
文本
[18]
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
import seaborn as sns

# Plot residuals
residuals_data = []
for name, result in results.items():
if name.startswith("2D-QSAR"):
model_residuals = pd.DataFrame({"Model": name, "Error": result["error"]})
residuals_data.append(model_residuals)

residuals_df = pd.concat(residuals_data, ignore_index=True)
residuals_df.sort_values(by="Error", ascending=True, inplace=True)
model_order = residuals_df.groupby("Model")["Error"].median().sort_values(ascending=True).index

# Use seaborn to draw the violin plot
plt.figure(figsize=(10, 7), dpi=150)
font = {'family': 'serif',
'color': 'black',
'weight': 'normal',
'size': 15}
sns.boxplot(y="Model", x="Error", data=residuals_df, order=model_order)
plt.xlabel("Abs Error", fontdict=font)
plt.ylabel("Models", fontdict=font)
plt.show()

<Figure size 1500x1050 with 1 Axes>
代码
文本

3D-QSAR Molecular Characterization

However, due to the presence of intermolecular and intramolecular interactions, molecules with similar topological structures may adopt different conformations in various environments. The conformation of each molecule in different environments and the corresponding energy levels determine the true nature of the molecule. Therefore, scientists aim to incorporate the three-dimensional structure of molecules into QSAR modeling to enhance the ability to predict molecular properties in specific scenarios. This stage is referred to as the 3D-QSAR stage.

The Comparative Molecular Field Analysis (CoFMA) is a widely used 3D-QSAR model. It calculates the forces (i.e., force fields) at various positions in the space where the molecule exists (usually by selecting positions through a grid method) to characterize the three-dimensional structure of the molecule. Of course, there are other beneficial attempts in the field, including characterization methods through electron density, three-dimensional molecular images, or adding geometric information to molecular graphs.

To handle such high-dimensional spatial information, scientists often choose deep learning methods such as deeper FCNN, 3D-CNN, GNN, etc., for modeling.

代码
文本
[19]
from rdkit.Chem import rdPartialCharges

def calculate_3dqsar_repr(SMILES, max_atoms=100, three_d=False):
mol = Chem.MolFromSmiles(SMILES) # Create a molecule object from the SMILES representation
mol = Chem.AddHs(mol) # Add hydrogen atoms
if three_d:
AllChem.EmbedMolecule(mol, AllChem.ETKDG()) # Compute 3D coordinates
else:
AllChem.Compute2DCoords(mol) # Compute 2D coordinates
natoms = mol.GetNumAtoms() # Get the number of atoms
rdPartialCharges.ComputeGasteigerCharges(mol) # Calculate Gasteiger charges for the molecule
charges = np.array([float(atom.GetProp("_GasteigerCharge")) for atom in mol.GetAtoms()]) # Retrieve charge values
coords = mol.GetConformer().GetPositions() # Get atom coordinates
coulomb_matrix = np.zeros((max_atoms, max_atoms)) # Initialize the Coulomb matrix
n = min(max_atoms, natoms)
for i in range(n): # Iterate through atoms
for j in range(i, n):
if i == j:
coulomb_matrix[i, j] = 0.5 * charges[i] ** 2
if i != j:
delta = np.linalg.norm(coords[i] - coords[j]) # Calculate the distance between atoms
if delta != 0:
coulomb_matrix[i, j] = charges[i] * charges[j] / delta # Compute element values for the Coulomb matrix
coulomb_matrix[j, i] = coulomb_matrix[i, j]
coulomb_matrix = np.where(np.isinf(coulomb_matrix), 0, coulomb_matrix) # Handle infinite values
coulomb_matrix = np.where(np.isnan(coulomb_matrix), 0, coulomb_matrix) # Handle NaN values
return coulomb_matrix.reshape(max_atoms * max_atoms).tolist() # Convert the Coulomb matrix to a list and return

# Apply the function to calculate 3D-QSAR molecular representation for training and testing data
train_data["3dqsar_mr"] = train_data["SMILES"].apply(calculate_3dqsar_repr)
test_data["3dqsar_mr"] = test_data["SMILES"].apply(calculate_3dqsar_repr)

代码
文本
[20]
print("length:", len(train_data["3dqsar_mr"][:1].values.tolist()[0]))
length: 10000
代码
文本

We can see that 3D-QSAR will construct very long molecular representations. Therefore, we first perform dimensionality reduction on this molecular representation using PCA.

代码
文本
[21]
from sklearn.decomposition import PCA

# Define a PCA object, with n_components set to 512, indicating dimensionality reduction to 512 dimensions
pca = PCA(n_components=512)

# Fit and transform the training data
train_data_pca = pca.fit_transform(np.array(train_data["3dqsar_mr"].tolist()))

# Transform the test data
test_data_pca = pca.transform(np.array(test_data["3dqsar_mr"].tolist()))

# Store the reduced-dimensionality data as new columns
train_data["3dqsar_mr_pca"] = train_data_pca.tolist()
test_data["3dqsar_mr_pca"] = test_data_pca.tolist()

代码
文本
[22]

[3D-QSAR][Linear Regression]	MSE:34953863171341550859845632.0000
[3D-QSAR][Ridge Regression]	MSE:4392658479297741235159040.0000
[3D-QSAR][Lasso Regression]	MSE:805.7580
[3D-QSAR][ElasticNet Regression]	MSE:2390.2618
[3D-QSAR][Support Vector]	MSE:1.0427
[3D-QSAR][K-Nearest Neighbors]	MSE:1.1943
[3D-QSAR][Decision Tree]	MSE:1.5984
[3D-QSAR][Random Forest]	MSE:0.7831
[3D-QSAR][Gradient Boosting]	MSE:0.8663
[3D-QSAR][XGBoost]	MSE:0.8103
[3D-QSAR][LightGBM]	MSE:0.7307
[3D-QSAR][Multi-layer Perceptron]	MSE:3482168556886455484416.0000
代码
文本
[23]
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
import seaborn as sns

# Plot residuals
residuals_data = []
for name, result in results.items():
if name.startswith("3D-QSAR"):
if result["MSE"] > 10:
continue
model_residuals = pd.DataFrame({"Model": name, "Error": result["error"]})
residuals_data.append(model_residuals)

residuals_df = pd.concat(residuals_data, ignore_index=True)
residuals_df.sort_values(by="Error", ascending=True, inplace=True)
model_order = residuals_df.groupby("Model")["Error"].median().sort_values(ascending=True).index

# Use seaborn to draw the violin plot
plt.figure(figsize=(10, 7), dpi=150)
font = {'family': 'serif',
'color': 'black',
'weight': 'normal',
'size': 15}
sns.boxplot(y="Model", x="Error", data=residuals_df, order=model_order)
plt.xlabel("Abs Error", fontdict=font)
plt.ylabel("Models", fontdict=font)
plt.show()

<Figure size 1500x1050 with 1 Axes>
代码
文本

Uni-Mol Molecular Representation Learning and Pretraining Framework

Pretraining Model

One of the main challenges in QSAR modeling within the field of drug development is the limited amount of data. Due to the high cost and experimental difficulty of obtaining drug activity data, there is often a lack of labeled data. Insufficient data affects the model's predictive ability, as it may be difficult for the model to capture enough information to describe the relationship between compound structure and biological activity.

Faced with this situation of insufficient labeled data, the pretrain-finetune approach has become a common solution in more mature fields of machine learning, such as natural language processing (NLP) and computer vision (CV). Pretraining involves training the model on a large amount of unlabeled data through self-supervised learning, allowing the model to gain basic information and general capabilities. The model is then fine-tuned on a smaller set of labeled data through supervised learning to equip it with specific problem-solving abilities.

For example, if I want to perform image recognition of cats and dogs but lack sufficient labeled data, I can first pretrain the model using a large set of unlabeled images, enabling it to learn basic concepts of lines, shapes, and contours. Afterward, I can use supervised learning with cat and dog images, allowing the model to quickly learn to distinguish between cats and dogs based on contour information.

The pretraining approach can effectively utilize large amounts of easily accessible unlabeled data to improve the model's generalization ability and predictive performance. In QSAR modeling, we can also leverage the concept of pretraining to address issues related to data quantity and quality.

Introduction to Uni-Mol

Uni-Mol is a universal molecular representation learning framework based on 3D molecular structures, released by DeepModeling in May 2022. Uni-Mol takes 3D molecular structures as model input and uses around 200 million small molecule conformations and 3 million protein surface cavity structures. It pretrains the model using two self-supervised tasks: atom type restoration and atom coordinate restoration.

Uni-Mol Paper: https://openreview.net/forum?id=6K2RM6wVqKu
Open-source Code: https://github.com/dptech-corp/Uni-Mol

The representation learning from 3D information and the effective pretraining approach allow Uni-Mol to outperform SOTA (state of the art) models in almost all downstream tasks related to drug molecules and protein pockets. Uni-Mol can directly handle tasks such as molecular conformation generation and protein-ligand binding pose prediction, surpassing existing solutions. The paper was accepted at the top machine learning conference ICLR 2023.

Next, we will use Uni-Mol to build a BACE-1 molecular activity prediction task:

代码
文本
[3]
from unimol import MolTrain

clf = MolTrain(task='regression',
data_type='molecule',
epochs=50,
learning_rate=0.0001,
batch_size=16,
early_stopping=10,
metrics='mse',
split='random',
save_path='./exp_reg_hERG_0616',
)

clf.fit('datasets/hERG_train.csv')
2023-06-17 12:35:53 | unimol/data/datareader.py | 138 | INFO | Uni-Mol(QSAR) | Anomaly clean with 3 sigma threshold: 7363 -> 7232
2023-06-17 12:35:55 | unimol/data/conformer.py | 56 | INFO | Uni-Mol(QSAR) | Start generating conformers...
7232it [02:54, 41.48it/s]
2023-06-17 12:38:49 | unimol/data/conformer.py | 60 | INFO | Uni-Mol(QSAR) | Failed to generate conformers for 0.01% of molecules.
2023-06-17 12:38:49 | unimol/data/conformer.py | 62 | INFO | Uni-Mol(QSAR) | Failed to generate 3d conformers for 0.07% of molecules.
2023-06-17 12:38:49 | unimol/train.py | 86 | INFO | Uni-Mol(QSAR) | Output directory already exists: ./exp_reg_hERG_0616
2023-06-17 12:38:49 | unimol/train.py | 87 | INFO | Uni-Mol(QSAR) | Warning: Overwrite output directory: ./exp_reg_hERG_0616
2023-06-17 12:38:50 | unimol/models/unimol.py | 107 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2023-06-17 12:38:50 | unimol/models/nnmodel.py | 100 | INFO | Uni-Mol(QSAR) | start training Uni-Mol:unimolv1
2023-06-17 12:39:07 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [1/50] train_loss: 1.0198, val_loss: 0.9594, val_mse: 0.6890, lr: 0.000067, 14.3s
2023-06-17 12:39:16 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [2/50] train_loss: 0.9963, val_loss: 0.9689, val_mse: 0.7027, lr: 0.000099, 8.2s
2023-06-17 12:39:25 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [3/50] train_loss: 0.9583, val_loss: 1.0292, val_mse: 0.7534, lr: 0.000097, 8.1s
2023-06-17 12:39:33 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [4/50] train_loss: 0.9276, val_loss: 0.9263, val_mse: 0.6723, lr: 0.000095, 8.0s
2023-06-17 12:39:41 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [5/50] train_loss: 0.9060, val_loss: 0.8094, val_mse: 0.5913, lr: 0.000093, 8.1s
2023-06-17 12:39:50 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [6/50] train_loss: 0.8745, val_loss: 0.7400, val_mse: 0.5398, lr: 0.000091, 8.1s
2023-06-17 12:40:00 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [7/50] train_loss: 0.8458, val_loss: 0.7638, val_mse: 0.5611, lr: 0.000089, 8.9s
2023-06-17 12:40:08 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [8/50] train_loss: 0.8492, val_loss: 0.7316, val_mse: 0.5450, lr: 0.000087, 8.1s
2023-06-17 12:40:16 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [9/50] train_loss: 0.8398, val_loss: 0.6793, val_mse: 0.5067, lr: 0.000085, 8.0s
2023-06-17 12:40:24 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [10/50] train_loss: 0.8264, val_loss: 0.7524, val_mse: 0.5610, lr: 0.000082, 8.1s
2023-06-17 12:40:32 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [11/50] train_loss: 0.8191, val_loss: 0.6767, val_mse: 0.4993, lr: 0.000080, 8.2s
2023-06-17 12:40:41 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [12/50] train_loss: 0.8035, val_loss: 0.6709, val_mse: 0.4910, lr: 0.000078, 8.1s
2023-06-17 12:40:50 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [13/50] train_loss: 0.7891, val_loss: 0.6657, val_mse: 0.4940, lr: 0.000076, 8.2s
2023-06-17 12:40:58 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [14/50] train_loss: 0.7720, val_loss: 0.6442, val_mse: 0.4793, lr: 0.000074, 8.2s
2023-06-17 12:41:07 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [15/50] train_loss: 0.7678, val_loss: 0.6312, val_mse: 0.4662, lr: 0.000072, 8.3s
2023-06-17 12:41:16 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [16/50] train_loss: 0.7501, val_loss: 0.6796, val_mse: 0.5068, lr: 0.000070, 8.2s
2023-06-17 12:41:24 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [17/50] train_loss: 0.7714, val_loss: 0.5719, val_mse: 0.4233, lr: 0.000068, 8.2s
2023-06-17 12:41:33 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [18/50] train_loss: 0.7501, val_loss: 0.6096, val_mse: 0.4527, lr: 0.000066, 8.1s
2023-06-17 12:41:41 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [19/50] train_loss: 0.7531, val_loss: 0.7351, val_mse: 0.5461, lr: 0.000064, 8.1s
2023-06-17 12:41:49 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [20/50] train_loss: 0.7357, val_loss: 0.5855, val_mse: 0.4357, lr: 0.000062, 8.1s
2023-06-17 12:41:58 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [21/50] train_loss: 0.7334, val_loss: 0.5762, val_mse: 0.4231, lr: 0.000060, 8.2s
2023-06-17 12:42:07 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [22/50] train_loss: 0.7062, val_loss: 0.5763, val_mse: 0.4312, lr: 0.000058, 8.1s
2023-06-17 12:42:15 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [23/50] train_loss: 0.7371, val_loss: 0.5740, val_mse: 0.4278, lr: 0.000056, 8.2s
2023-06-17 12:42:23 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [24/50] train_loss: 0.7131, val_loss: 0.6085, val_mse: 0.4584, lr: 0.000054, 8.1s
2023-06-17 12:42:31 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [25/50] train_loss: 0.7075, val_loss: 0.5816, val_mse: 0.4340, lr: 0.000052, 8.1s
2023-06-17 12:42:39 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [26/50] train_loss: 0.6932, val_loss: 0.5505, val_mse: 0.4123, lr: 0.000049, 8.0s
2023-06-17 12:42:49 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [27/50] train_loss: 0.7009, val_loss: 0.8499, val_mse: 0.6284, lr: 0.000047, 8.2s
2023-06-17 12:42:57 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [28/50] train_loss: 0.6985, val_loss: 0.6224, val_mse: 0.4643, lr: 0.000045, 8.1s
2023-06-17 12:43:05 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [29/50] train_loss: 0.6649, val_loss: 0.6121, val_mse: 0.4566, lr: 0.000043, 8.3s
2023-06-17 12:43:13 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [30/50] train_loss: 0.6705, val_loss: 0.5974, val_mse: 0.4445, lr: 0.000041, 8.0s
2023-06-17 12:43:21 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [31/50] train_loss: 0.6639, val_loss: 0.6243, val_mse: 0.4603, lr: 0.000039, 8.1s
2023-06-17 12:43:29 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [32/50] train_loss: 0.6774, val_loss: 0.5461, val_mse: 0.4065, lr: 0.000037, 8.0s
2023-06-17 12:43:38 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [33/50] train_loss: 0.6568, val_loss: 0.5854, val_mse: 0.4339, lr: 0.000035, 8.1s
2023-06-17 12:43:46 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [34/50] train_loss: 0.6483, val_loss: 0.6151, val_mse: 0.4625, lr: 0.000033, 8.1s
2023-06-17 12:43:54 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [35/50] train_loss: 0.6560, val_loss: 0.5742, val_mse: 0.4311, lr: 0.000031, 8.1s
2023-06-17 12:44:03 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [36/50] train_loss: 0.6448, val_loss: 0.6001, val_mse: 0.4513, lr: 0.000029, 8.6s
2023-06-17 12:44:11 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [37/50] train_loss: 0.6241, val_loss: 0.6042, val_mse: 0.4489, lr: 0.000027, 8.2s
2023-06-17 12:44:19 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [38/50] train_loss: 0.6224, val_loss: 0.6076, val_mse: 0.4564, lr: 0.000025, 8.1s
2023-06-17 12:44:27 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [39/50] train_loss: 0.6247, val_loss: 0.5829, val_mse: 0.4303, lr: 0.000023, 8.2s
2023-06-17 12:44:36 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [40/50] train_loss: 0.6328, val_loss: 0.5770, val_mse: 0.4261, lr: 0.000021, 8.2s
2023-06-17 12:44:44 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [41/50] train_loss: 0.6317, val_loss: 0.5813, val_mse: 0.4320, lr: 0.000019, 8.1s
2023-06-17 12:44:52 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [42/50] train_loss: 0.6271, val_loss: 0.6083, val_mse: 0.4543, lr: 0.000016, 8.1s
2023-06-17 12:44:52 | unimol/utils/metrics.py | 255 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 42
2023-06-17 12:44:53 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success!
2023-06-17 12:44:54 | unimol/models/nnmodel.py | 123 | INFO | Uni-Mol(QSAR) | fold 0, result {'mse': 0.40648797, 'mae': 0.4646326, 'spearmanr': 0.685573097620505, 'rmse': 0.63756406, 'r2': 0.44125538296758327}
2023-06-17 12:44:55 | unimol/models/unimol.py | 107 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2023-06-17 12:45:03 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [1/50] train_loss: 1.0218, val_loss: 1.0607, val_mse: 0.7615, lr: 0.000067, 8.2s
2023-06-17 12:45:12 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [2/50] train_loss: 1.0089, val_loss: 0.9100, val_mse: 0.6544, lr: 0.000099, 8.1s
2023-06-17 12:45:21 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [3/50] train_loss: 0.9560, val_loss: 0.8490, val_mse: 0.6177, lr: 0.000097, 8.1s
2023-06-17 12:45:30 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [4/50] train_loss: 0.9313, val_loss: 0.8471, val_mse: 0.6134, lr: 0.000095, 8.1s
2023-06-17 12:45:39 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [5/50] train_loss: 0.9388, val_loss: 0.7970, val_mse: 0.5788, lr: 0.000093, 8.2s
2023-06-17 12:45:48 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [6/50] train_loss: 0.8808, val_loss: 0.7646, val_mse: 0.5524, lr: 0.000091, 8.1s
2023-06-17 12:45:57 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [7/50] train_loss: 0.8609, val_loss: 0.8060, val_mse: 0.5749, lr: 0.000089, 8.1s
2023-06-17 12:46:06 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [8/50] train_loss: 0.8700, val_loss: 0.7409, val_mse: 0.5293, lr: 0.000087, 8.1s
2023-06-17 12:46:15 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [9/50] train_loss: 0.8086, val_loss: 0.7425, val_mse: 0.5365, lr: 0.000085, 8.1s
2023-06-17 12:46:23 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [10/50] train_loss: 0.8355, val_loss: 0.8126, val_mse: 0.5832, lr: 0.000082, 8.2s
2023-06-17 12:46:31 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [11/50] train_loss: 0.8044, val_loss: 0.6809, val_mse: 0.4987, lr: 0.000080, 8.2s
2023-06-17 12:46:40 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [12/50] train_loss: 0.7771, val_loss: 0.6235, val_mse: 0.4526, lr: 0.000078, 8.2s
2023-06-17 12:46:49 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [13/50] train_loss: 0.7960, val_loss: 0.6240, val_mse: 0.4551, lr: 0.000076, 8.1s
2023-06-17 12:46:58 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [14/50] train_loss: 0.7717, val_loss: 0.6819, val_mse: 0.5010, lr: 0.000074, 8.2s
2023-06-17 12:47:06 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [15/50] train_loss: 0.7460, val_loss: 0.6513, val_mse: 0.4798, lr: 0.000072, 8.0s
2023-06-17 12:47:14 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [16/50] train_loss: 0.7331, val_loss: 0.6302, val_mse: 0.4661, lr: 0.000070, 8.2s
2023-06-17 12:47:22 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [17/50] train_loss: 0.7535, val_loss: 0.5941, val_mse: 0.4365, lr: 0.000068, 8.2s
2023-06-17 12:47:31 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [18/50] train_loss: 0.7380, val_loss: 0.5936, val_mse: 0.4342, lr: 0.000066, 8.1s
2023-06-17 12:47:40 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [19/50] train_loss: 0.7061, val_loss: 0.6066, val_mse: 0.4422, lr: 0.000064, 8.2s
2023-06-17 12:47:48 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [20/50] train_loss: 0.7326, val_loss: 0.6399, val_mse: 0.4771, lr: 0.000062, 8.2s
2023-06-17 12:47:57 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [21/50] train_loss: 0.7176, val_loss: 0.6288, val_mse: 0.4616, lr: 0.000060, 9.0s
2023-06-17 12:48:06 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [22/50] train_loss: 0.7048, val_loss: 0.6277, val_mse: 0.4632, lr: 0.000058, 9.2s
2023-06-17 12:48:15 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [23/50] train_loss: 0.6901, val_loss: 0.5978, val_mse: 0.4354, lr: 0.000056, 9.1s
2023-06-17 12:48:23 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [24/50] train_loss: 0.6980, val_loss: 0.5127, val_mse: 0.3796, lr: 0.000054, 8.3s
2023-06-17 12:48:33 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [25/50] train_loss: 0.6828, val_loss: 0.6137, val_mse: 0.4506, lr: 0.000052, 8.1s
2023-06-17 12:48:41 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [26/50] train_loss: 0.6848, val_loss: 0.5487, val_mse: 0.4038, lr: 0.000049, 8.1s
2023-06-17 12:48:49 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [27/50] train_loss: 0.6755, val_loss: 0.5651, val_mse: 0.4137, lr: 0.000047, 8.2s
2023-06-17 12:48:58 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [28/50] train_loss: 0.6721, val_loss: 0.5640, val_mse: 0.4132, lr: 0.000045, 8.2s
2023-06-17 12:49:06 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [29/50] train_loss: 0.6658, val_loss: 0.5870, val_mse: 0.4300, lr: 0.000043, 8.1s
2023-06-17 12:49:14 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [30/50] train_loss: 0.6704, val_loss: 0.5301, val_mse: 0.3882, lr: 0.000041, 8.2s
2023-06-17 12:49:22 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [31/50] train_loss: 0.6458, val_loss: 0.5127, val_mse: 0.3740, lr: 0.000039, 8.1s
2023-06-17 12:49:31 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [32/50] train_loss: 0.6604, val_loss: 0.5854, val_mse: 0.4273, lr: 0.000037, 8.1s
2023-06-17 12:49:39 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [33/50] train_loss: 0.6626, val_loss: 0.6108, val_mse: 0.4480, lr: 0.000035, 8.2s
2023-06-17 12:49:47 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [34/50] train_loss: 0.6247, val_loss: 0.5063, val_mse: 0.3733, lr: 0.000033, 8.2s
2023-06-17 12:49:56 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [35/50] train_loss: 0.6381, val_loss: 0.5741, val_mse: 0.4197, lr: 0.000031, 8.1s
2023-06-17 12:50:05 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [36/50] train_loss: 0.6286, val_loss: 0.5272, val_mse: 0.3871, lr: 0.000029, 8.0s
2023-06-17 12:50:13 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [37/50] train_loss: 0.6384, val_loss: 0.5324, val_mse: 0.3904, lr: 0.000027, 8.2s
2023-06-17 12:50:21 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [38/50] train_loss: 0.6329, val_loss: 0.5152, val_mse: 0.3793, lr: 0.000025, 8.1s
2023-06-17 12:50:29 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [39/50] train_loss: 0.6288, val_loss: 0.5184, val_mse: 0.3816, lr: 0.000023, 8.0s
2023-06-17 12:50:37 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [40/50] train_loss: 0.6354, val_loss: 0.5204, val_mse: 0.3816, lr: 0.000021, 8.1s
2023-06-17 12:50:45 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [41/50] train_loss: 0.6100, val_loss: 0.5263, val_mse: 0.3872, lr: 0.000019, 8.1s
2023-06-17 12:50:53 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [42/50] train_loss: 0.6270, val_loss: 0.5422, val_mse: 0.3978, lr: 0.000016, 8.2s
2023-06-17 12:51:01 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [43/50] train_loss: 0.5859, val_loss: 0.5251, val_mse: 0.3867, lr: 0.000014, 8.1s
2023-06-17 12:51:10 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [44/50] train_loss: 0.5847, val_loss: 0.5394, val_mse: 0.3978, lr: 0.000012, 8.7s
2023-06-17 12:51:10 | unimol/utils/metrics.py | 255 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 44
2023-06-17 12:51:11 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success!
2023-06-17 12:51:13 | unimol/models/nnmodel.py | 123 | INFO | Uni-Mol(QSAR) | fold 1, result {'mse': 0.3733483, 'mae': 0.43755728, 'spearmanr': 0.7287544581640742, 'rmse': 0.61102235, 'r2': 0.4841641368912225}
2023-06-17 12:51:13 | unimol/models/unimol.py | 107 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2023-06-17 12:51:23 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [1/50] train_loss: 1.0566, val_loss: 0.9536, val_mse: 0.7052, lr: 0.000067, 9.2s
2023-06-17 12:51:33 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [2/50] train_loss: 1.0477, val_loss: 0.8633, val_mse: 0.6460, lr: 0.000099, 9.1s
2023-06-17 12:51:43 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [3/50] train_loss: 1.0039, val_loss: 0.9287, val_mse: 0.6972, lr: 0.000097, 9.2s
2023-06-17 12:51:52 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [4/50] train_loss: 0.9543, val_loss: 0.7414, val_mse: 0.5560, lr: 0.000095, 8.8s
2023-06-17 12:52:01 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [5/50] train_loss: 0.9125, val_loss: 0.7524, val_mse: 0.5660, lr: 0.000093, 8.1s
2023-06-17 12:52:09 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [6/50] train_loss: 0.9342, val_loss: 0.6828, val_mse: 0.5135, lr: 0.000091, 8.1s
2023-06-17 12:52:18 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [7/50] train_loss: 0.8851, val_loss: 0.6962, val_mse: 0.5234, lr: 0.000089, 8.2s
2023-06-17 12:52:26 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [8/50] train_loss: 0.8771, val_loss: 0.6158, val_mse: 0.4621, lr: 0.000087, 8.1s
2023-06-17 12:52:35 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [9/50] train_loss: 0.8470, val_loss: 0.5859, val_mse: 0.4403, lr: 0.000085, 8.1s
2023-06-17 12:52:44 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [10/50] train_loss: 0.8372, val_loss: 0.6298, val_mse: 0.4755, lr: 0.000082, 8.4s
2023-06-17 12:52:52 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [11/50] train_loss: 0.8359, val_loss: 0.7237, val_mse: 0.5394, lr: 0.000080, 8.1s
2023-06-17 12:53:00 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [12/50] train_loss: 0.8217, val_loss: 0.5328, val_mse: 0.4006, lr: 0.000078, 8.0s
2023-06-17 12:53:09 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [13/50] train_loss: 0.8404, val_loss: 0.7307, val_mse: 0.5494, lr: 0.000076, 7.9s
2023-06-17 12:53:17 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [14/50] train_loss: 0.8167, val_loss: 0.5784, val_mse: 0.4335, lr: 0.000074, 8.2s
2023-06-17 12:53:25 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [15/50] train_loss: 0.7869, val_loss: 0.5668, val_mse: 0.4273, lr: 0.000072, 8.1s
2023-06-17 12:53:33 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [16/50] train_loss: 0.7866, val_loss: 0.5809, val_mse: 0.4385, lr: 0.000070, 8.2s
2023-06-17 12:53:41 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [17/50] train_loss: 0.7720, val_loss: 0.5132, val_mse: 0.3853, lr: 0.000068, 8.2s
2023-06-17 12:53:50 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [18/50] train_loss: 0.7649, val_loss: 0.6460, val_mse: 0.4820, lr: 0.000066, 8.1s
2023-06-17 12:53:58 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [19/50] train_loss: 0.7703, val_loss: 0.5627, val_mse: 0.4162, lr: 0.000064, 8.1s
2023-06-17 12:54:06 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [20/50] train_loss: 0.7510, val_loss: 0.5498, val_mse: 0.4076, lr: 0.000062, 8.2s
2023-06-17 12:54:15 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [21/50] train_loss: 0.7405, val_loss: 0.6078, val_mse: 0.4568, lr: 0.000060, 8.2s
2023-06-17 12:54:23 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [22/50] train_loss: 0.7389, val_loss: 0.5206, val_mse: 0.3895, lr: 0.000058, 7.9s
2023-06-17 12:54:31 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [23/50] train_loss: 0.7120, val_loss: 0.5197, val_mse: 0.3883, lr: 0.000056, 8.1s
2023-06-17 12:54:40 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [24/50] train_loss: 0.7085, val_loss: 0.5299, val_mse: 0.3929, lr: 0.000054, 9.1s
2023-06-17 12:54:49 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [25/50] train_loss: 0.7204, val_loss: 0.4924, val_mse: 0.3634, lr: 0.000052, 9.0s
2023-06-17 12:54:58 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [26/50] train_loss: 0.7203, val_loss: 0.4618, val_mse: 0.3460, lr: 0.000049, 8.8s
2023-06-17 12:55:07 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [27/50] train_loss: 0.7021, val_loss: 0.5251, val_mse: 0.3886, lr: 0.000047, 8.3s
2023-06-17 12:55:15 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [28/50] train_loss: 0.7067, val_loss: 0.5120, val_mse: 0.3827, lr: 0.000045, 8.0s
2023-06-17 12:55:23 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [29/50] train_loss: 0.6974, val_loss: 0.5871, val_mse: 0.4380, lr: 0.000043, 8.0s
2023-06-17 12:55:32 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [30/50] train_loss: 0.7017, val_loss: 0.4469, val_mse: 0.3339, lr: 0.000041, 8.2s
2023-06-17 12:55:40 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [31/50] train_loss: 0.6828, val_loss: 0.5249, val_mse: 0.3889, lr: 0.000039, 7.9s
2023-06-17 12:55:48 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [32/50] train_loss: 0.6813, val_loss: 0.5739, val_mse: 0.4252, lr: 0.000037, 8.2s
2023-06-17 12:55:57 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [33/50] train_loss: 0.6845, val_loss: 0.4893, val_mse: 0.3636, lr: 0.000035, 8.5s
2023-06-17 12:56:05 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [34/50] train_loss: 0.6813, val_loss: 0.6106, val_mse: 0.4473, lr: 0.000033, 8.2s
2023-06-17 12:56:13 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [35/50] train_loss: 0.6657, val_loss: 0.5357, val_mse: 0.4016, lr: 0.000031, 8.4s
2023-06-17 12:56:22 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [36/50] train_loss: 0.6654, val_loss: 0.5157, val_mse: 0.3824, lr: 0.000029, 8.1s
2023-06-17 12:56:30 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [37/50] train_loss: 0.6520, val_loss: 0.5144, val_mse: 0.3813, lr: 0.000027, 8.0s
2023-06-17 12:56:38 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [38/50] train_loss: 0.6396, val_loss: 0.5489, val_mse: 0.4070, lr: 0.000025, 8.2s
2023-06-17 12:56:46 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [39/50] train_loss: 0.6530, val_loss: 0.5142, val_mse: 0.3851, lr: 0.000023, 8.0s
2023-06-17 12:56:54 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [40/50] train_loss: 0.6644, val_loss: 0.5770, val_mse: 0.4274, lr: 0.000021, 8.1s
2023-06-17 12:56:54 | unimol/utils/metrics.py | 255 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 40
2023-06-17 12:56:56 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success!
2023-06-17 12:56:57 | unimol/models/nnmodel.py | 123 | INFO | Uni-Mol(QSAR) | fold 2, result {'mse': 0.33389777, 'mae': 0.42934737, 'spearmanr': 0.7043604826771424, 'rmse': 0.5778389, 'r2': 0.5037052463708651}
2023-06-17 12:56:58 | unimol/models/unimol.py | 107 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2023-06-17 12:57:06 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [1/50] train_loss: 0.9963, val_loss: 1.1187, val_mse: 0.8181, lr: 0.000067, 8.1s
2023-06-17 12:57:16 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [2/50] train_loss: 0.9809, val_loss: 1.0758, val_mse: 0.7894, lr: 0.000099, 8.9s
2023-06-17 12:57:25 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [3/50] train_loss: 0.9744, val_loss: 0.9197, val_mse: 0.6816, lr: 0.000097, 8.3s
2023-06-17 12:57:34 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [4/50] train_loss: 0.8896, val_loss: 0.8931, val_mse: 0.6608, lr: 0.000095, 8.1s
2023-06-17 12:57:43 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [5/50] train_loss: 0.8730, val_loss: 0.7967, val_mse: 0.5925, lr: 0.000093, 8.1s
2023-06-17 12:57:52 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [6/50] train_loss: 0.8742, val_loss: 0.8469, val_mse: 0.6300, lr: 0.000091, 8.1s
2023-06-17 12:58:00 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [7/50] train_loss: 0.8481, val_loss: 0.7456, val_mse: 0.5564, lr: 0.000089, 8.1s
2023-06-17 12:58:09 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [8/50] train_loss: 0.8167, val_loss: 0.7243, val_mse: 0.5415, lr: 0.000087, 8.1s
2023-06-17 12:58:18 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [9/50] train_loss: 0.7877, val_loss: 0.7422, val_mse: 0.5536, lr: 0.000085, 8.1s
2023-06-17 12:58:26 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [10/50] train_loss: 0.7905, val_loss: 0.6694, val_mse: 0.5022, lr: 0.000082, 8.0s
2023-06-17 12:58:35 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [11/50] train_loss: 0.7806, val_loss: 0.6803, val_mse: 0.5080, lr: 0.000080, 8.2s
2023-06-17 12:58:43 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [12/50] train_loss: 0.7693, val_loss: 0.7758, val_mse: 0.5808, lr: 0.000078, 8.2s
2023-06-17 12:58:51 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [13/50] train_loss: 0.8012, val_loss: 0.6340, val_mse: 0.4780, lr: 0.000076, 8.1s
2023-06-17 12:59:00 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [14/50] train_loss: 0.7451, val_loss: 0.7638, val_mse: 0.5757, lr: 0.000074, 8.0s
2023-06-17 12:59:08 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [15/50] train_loss: 0.7363, val_loss: 0.6183, val_mse: 0.4703, lr: 0.000072, 8.3s
2023-06-17 12:59:17 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [16/50] train_loss: 0.7324, val_loss: 0.6532, val_mse: 0.4934, lr: 0.000070, 8.2s
2023-06-17 12:59:26 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [17/50] train_loss: 0.7169, val_loss: 0.6582, val_mse: 0.4971, lr: 0.000068, 8.6s
2023-06-17 12:59:35 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [18/50] train_loss: 0.6963, val_loss: 0.6372, val_mse: 0.4803, lr: 0.000066, 9.2s
2023-06-17 12:59:44 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [19/50] train_loss: 0.7022, val_loss: 0.6088, val_mse: 0.4620, lr: 0.000064, 9.0s
2023-06-17 12:59:53 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [20/50] train_loss: 0.6787, val_loss: 0.5849, val_mse: 0.4404, lr: 0.000062, 8.4s
2023-06-17 13:00:02 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [21/50] train_loss: 0.7170, val_loss: 0.6280, val_mse: 0.4728, lr: 0.000060, 8.1s
2023-06-17 13:00:10 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [22/50] train_loss: 0.6678, val_loss: 0.6732, val_mse: 0.5074, lr: 0.000058, 8.1s
2023-06-17 13:00:18 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [23/50] train_loss: 0.6440, val_loss: 0.5854, val_mse: 0.4393, lr: 0.000056, 7.9s
2023-06-17 13:00:26 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [24/50] train_loss: 0.6608, val_loss: 0.6185, val_mse: 0.4654, lr: 0.000054, 8.1s
2023-06-17 13:00:34 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [25/50] train_loss: 0.6663, val_loss: 0.6564, val_mse: 0.4948, lr: 0.000052, 8.0s
2023-06-17 13:00:42 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [26/50] train_loss: 0.6536, val_loss: 0.5631, val_mse: 0.4249, lr: 0.000049, 8.0s
2023-06-17 13:00:53 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [27/50] train_loss: 0.6346, val_loss: 0.5800, val_mse: 0.4309, lr: 0.000047, 9.1s
2023-06-17 13:01:02 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [28/50] train_loss: 0.6523, val_loss: 0.5852, val_mse: 0.4444, lr: 0.000045, 8.9s
2023-06-17 13:01:10 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [29/50] train_loss: 0.6302, val_loss: 0.5751, val_mse: 0.4311, lr: 0.000043, 8.2s
2023-06-17 13:01:18 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [30/50] train_loss: 0.6229, val_loss: 0.5828, val_mse: 0.4391, lr: 0.000041, 8.0s
2023-06-17 13:01:26 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [31/50] train_loss: 0.6291, val_loss: 0.5778, val_mse: 0.4351, lr: 0.000039, 8.1s
2023-06-17 13:01:34 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [32/50] train_loss: 0.5913, val_loss: 0.5777, val_mse: 0.4298, lr: 0.000037, 8.1s
2023-06-17 13:01:42 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [33/50] train_loss: 0.6336, val_loss: 0.5617, val_mse: 0.4222, lr: 0.000035, 8.2s
2023-06-17 13:01:51 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [34/50] train_loss: 0.6080, val_loss: 0.5359, val_mse: 0.4038, lr: 0.000033, 8.2s
2023-06-17 13:02:00 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [35/50] train_loss: 0.6015, val_loss: 0.5468, val_mse: 0.4091, lr: 0.000031, 8.0s
2023-06-17 13:02:08 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [36/50] train_loss: 0.5942, val_loss: 0.5709, val_mse: 0.4266, lr: 0.000029, 8.1s
2023-06-17 13:02:16 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [37/50] train_loss: 0.5994, val_loss: 0.5571, val_mse: 0.4149, lr: 0.000027, 8.1s
2023-06-17 13:02:24 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [38/50] train_loss: 0.6010, val_loss: 0.5653, val_mse: 0.4218, lr: 0.000025, 8.1s
2023-06-17 13:02:32 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [39/50] train_loss: 0.5760, val_loss: 0.5826, val_mse: 0.4352, lr: 0.000023, 8.0s
2023-06-17 13:02:40 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [40/50] train_loss: 0.6080, val_loss: 0.5744, val_mse: 0.4271, lr: 0.000021, 8.2s
2023-06-17 13:02:49 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [41/50] train_loss: 0.5717, val_loss: 0.5595, val_mse: 0.4186, lr: 0.000019, 8.1s
2023-06-17 13:02:57 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [42/50] train_loss: 0.5596, val_loss: 0.5536, val_mse: 0.4140, lr: 0.000016, 8.0s
2023-06-17 13:03:05 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [43/50] train_loss: 0.5461, val_loss: 0.5577, val_mse: 0.4174, lr: 0.000014, 8.1s
2023-06-17 13:03:13 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [44/50] train_loss: 0.5775, val_loss: 0.5613, val_mse: 0.4205, lr: 0.000012, 8.1s
2023-06-17 13:03:13 | unimol/utils/metrics.py | 255 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 44
2023-06-17 13:03:14 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success!
2023-06-17 13:03:15 | unimol/models/nnmodel.py | 123 | INFO | Uni-Mol(QSAR) | fold 3, result {'mse': 0.40383738, 'mae': 0.45316046, 'spearmanr': 0.7082075781363502, 'rmse': 0.635482, 'r2': 0.4945312503291812}
2023-06-17 13:03:16 | unimol/models/unimol.py | 107 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2023-06-17 13:03:24 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [1/50] train_loss: 1.0177, val_loss: 0.9977, val_mse: 0.7278, lr: 0.000067, 8.1s
2023-06-17 13:03:33 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [2/50] train_loss: 1.0013, val_loss: 1.1132, val_mse: 0.8146, lr: 0.000099, 8.0s
2023-06-17 13:03:42 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [3/50] train_loss: 0.9368, val_loss: 0.8208, val_mse: 0.5932, lr: 0.000097, 8.8s
2023-06-17 13:03:51 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [4/50] train_loss: 0.8955, val_loss: 0.8036, val_mse: 0.5778, lr: 0.000095, 8.8s
2023-06-17 13:04:01 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [5/50] train_loss: 0.8553, val_loss: 0.7418, val_mse: 0.5351, lr: 0.000093, 8.9s
2023-06-17 13:04:10 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [6/50] train_loss: 0.8764, val_loss: 0.9070, val_mse: 0.6461, lr: 0.000091, 8.1s
2023-06-17 13:04:18 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [7/50] train_loss: 0.8546, val_loss: 0.6987, val_mse: 0.5121, lr: 0.000089, 8.0s
2023-06-17 13:04:28 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [8/50] train_loss: 0.8261, val_loss: 0.7095, val_mse: 0.5165, lr: 0.000087, 9.0s
2023-06-17 13:04:37 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [9/50] train_loss: 0.8314, val_loss: 0.6756, val_mse: 0.4913, lr: 0.000085, 9.0s
2023-06-17 13:04:46 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [10/50] train_loss: 0.8112, val_loss: 0.7011, val_mse: 0.5129, lr: 0.000082, 8.0s
2023-06-17 13:04:53 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [11/50] train_loss: 0.7951, val_loss: 0.6594, val_mse: 0.4772, lr: 0.000080, 7.8s
2023-06-17 13:05:02 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [12/50] train_loss: 0.8264, val_loss: 0.6824, val_mse: 0.5008, lr: 0.000078, 8.0s
2023-06-17 13:05:10 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [13/50] train_loss: 0.7938, val_loss: 0.6051, val_mse: 0.4428, lr: 0.000076, 8.0s
2023-06-17 13:05:19 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [14/50] train_loss: 0.7698, val_loss: 0.5837, val_mse: 0.4272, lr: 0.000074, 8.0s
2023-06-17 13:05:28 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [15/50] train_loss: 0.7579, val_loss: 0.6009, val_mse: 0.4390, lr: 0.000072, 8.0s
2023-06-17 13:05:36 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [16/50] train_loss: 0.7400, val_loss: 0.6783, val_mse: 0.4961, lr: 0.000070, 8.0s
2023-06-17 13:05:44 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [17/50] train_loss: 0.7337, val_loss: 0.6855, val_mse: 0.4961, lr: 0.000068, 7.9s
2023-06-17 13:05:52 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [18/50] train_loss: 0.7444, val_loss: 0.6196, val_mse: 0.4444, lr: 0.000066, 8.3s
2023-06-17 13:06:00 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [19/50] train_loss: 0.7323, val_loss: 0.5721, val_mse: 0.4148, lr: 0.000064, 8.2s
2023-06-17 13:06:09 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [20/50] train_loss: 0.7124, val_loss: 0.5886, val_mse: 0.4301, lr: 0.000062, 8.0s
2023-06-17 13:06:18 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [21/50] train_loss: 0.7051, val_loss: 0.5489, val_mse: 0.3975, lr: 0.000060, 8.4s
2023-06-17 13:06:27 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [22/50] train_loss: 0.7238, val_loss: 0.5354, val_mse: 0.3884, lr: 0.000058, 8.7s
2023-06-17 13:06:36 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [23/50] train_loss: 0.7240, val_loss: 0.6725, val_mse: 0.4873, lr: 0.000056, 8.1s
2023-06-17 13:06:44 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [24/50] train_loss: 0.6937, val_loss: 0.5550, val_mse: 0.4090, lr: 0.000054, 7.9s
2023-06-17 13:06:52 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [25/50] train_loss: 0.6826, val_loss: 0.6393, val_mse: 0.4632, lr: 0.000052, 8.0s
2023-06-17 13:07:00 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [26/50] train_loss: 0.6814, val_loss: 0.5455, val_mse: 0.3975, lr: 0.000049, 7.9s
2023-06-17 13:07:08 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [27/50] train_loss: 0.6589, val_loss: 0.5434, val_mse: 0.3960, lr: 0.000047, 8.0s
2023-06-17 13:07:16 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [28/50] train_loss: 0.6763, val_loss: 0.5872, val_mse: 0.4352, lr: 0.000045, 8.0s
2023-06-17 13:07:23 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [29/50] train_loss: 0.6659, val_loss: 0.5268, val_mse: 0.3853, lr: 0.000043, 7.8s
2023-06-17 13:07:32 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [30/50] train_loss: 0.6684, val_loss: 0.5575, val_mse: 0.4024, lr: 0.000041, 8.0s
2023-06-17 13:07:40 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [31/50] train_loss: 0.6467, val_loss: 0.5300, val_mse: 0.3829, lr: 0.000039, 8.0s
2023-06-17 13:07:50 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [32/50] train_loss: 0.6467, val_loss: 0.5846, val_mse: 0.4279, lr: 0.000037, 8.9s
2023-06-17 13:07:57 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [33/50] train_loss: 0.6270, val_loss: 0.5577, val_mse: 0.4093, lr: 0.000035, 7.9s
2023-06-17 13:08:05 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [34/50] train_loss: 0.6427, val_loss: 0.5868, val_mse: 0.4248, lr: 0.000033, 8.0s
2023-06-17 13:08:13 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [35/50] train_loss: 0.6501, val_loss: 0.5432, val_mse: 0.3935, lr: 0.000031, 7.9s
2023-06-17 13:08:21 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [36/50] train_loss: 0.6305, val_loss: 0.5176, val_mse: 0.3793, lr: 0.000029, 8.0s
2023-06-17 13:08:30 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [37/50] train_loss: 0.6297, val_loss: 0.5156, val_mse: 0.3713, lr: 0.000027, 7.9s
2023-06-17 13:08:39 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [38/50] train_loss: 0.6432, val_loss: 0.5531, val_mse: 0.4048, lr: 0.000025, 8.0s
2023-06-17 13:08:47 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [39/50] train_loss: 0.6161, val_loss: 0.5432, val_mse: 0.3905, lr: 0.000023, 7.9s
2023-06-17 13:08:55 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [40/50] train_loss: 0.6009, val_loss: 0.5115, val_mse: 0.3746, lr: 0.000021, 7.9s
2023-06-17 13:09:03 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [41/50] train_loss: 0.6052, val_loss: 0.5091, val_mse: 0.3698, lr: 0.000019, 7.9s
2023-06-17 13:09:11 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [42/50] train_loss: 0.6008, val_loss: 0.5425, val_mse: 0.3979, lr: 0.000016, 8.0s
2023-06-17 13:09:19 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [43/50] train_loss: 0.5973, val_loss: 0.5384, val_mse: 0.3888, lr: 0.000014, 8.0s
2023-06-17 13:09:27 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [44/50] train_loss: 0.6030, val_loss: 0.5716, val_mse: 0.4130, lr: 0.000012, 7.9s
2023-06-17 13:09:35 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [45/50] train_loss: 0.5957, val_loss: 0.5366, val_mse: 0.3926, lr: 0.000010, 8.0s
2023-06-17 13:09:43 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [46/50] train_loss: 0.6118, val_loss: 0.5083, val_mse: 0.3681, lr: 0.000008, 8.0s
2023-06-17 13:09:53 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [47/50] train_loss: 0.5846, val_loss: 0.5475, val_mse: 0.3945, lr: 0.000006, 8.9s
2023-06-17 13:10:01 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [48/50] train_loss: 0.5963, val_loss: 0.5365, val_mse: 0.3876, lr: 0.000004, 8.3s
2023-06-17 13:10:09 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [49/50] train_loss: 0.5992, val_loss: 0.5280, val_mse: 0.3823, lr: 0.000002, 8.0s
2023-06-17 13:10:18 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [50/50] train_loss: 0.5823, val_loss: 0.5252, val_mse: 0.3796, lr: 0.000000, 8.9s
2023-06-17 13:10:20 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success!
2023-06-17 13:10:21 | unimol/models/nnmodel.py | 123 | INFO | Uni-Mol(QSAR) | fold 4, result {'mse': 0.3681449, 'mae': 0.4405691, 'spearmanr': 0.7418240886637215, 'rmse': 0.6067495, 'r2': 0.5089094245543386}
2023-06-17 13:10:21 | unimol/models/nnmodel.py | 135 | INFO | Uni-Mol(QSAR) | Uni-Mol metrics score: 
{'mse': 0.3771468026353712, 'mae': 0.4450550450234257, 'spearmanr': 0.7078743179462539, 'rmse': 0.6141227911707652, 'r2': 0.4866199624063017}
2023-06-17 13:10:21 | unimol/models/nnmodel.py | 136 | INFO | Uni-Mol(QSAR) | Uni-Mol & Metric result saved!
代码
文本
[9]
from unimol import MolPredict
from sklearn.metrics import mean_squared_error

predm = MolPredict(load_model='./exp_reg_hERG_0616')
pred_train_y = predm.predict('datasets/hERG_train.csv').reshape(-1)
pred_test_y = predm.predict('datasets/hERG_test.csv').reshape(-1)

mse = mean_squared_error(test_y, pred_test_y)
se = abs(test_y - pred_test_y)
results[f"Uni-Mol"] = {"MSE": mse, "error": se}
print(f"[Uni-Mol]\tMSE:{mse:.4f}")
2023-06-17 13:21:14 | unimol/data/conformer.py | 56 | INFO | Uni-Mol(QSAR) | Start generating conformers...
7363it [02:58, 41.14it/s]
2023-06-17 13:24:13 | unimol/data/conformer.py | 60 | INFO | Uni-Mol(QSAR) | Failed to generate conformers for 0.01% of molecules.
2023-06-17 13:24:14 | unimol/data/conformer.py | 62 | INFO | Uni-Mol(QSAR) | Failed to generate 3d conformers for 0.07% of molecules.
2023-06-17 13:24:14 | unimol/models/unimol.py | 107 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2023-06-17 13:24:14 | unimol/models/nnmodel.py | 145 | INFO | Uni-Mol(QSAR) | start predict NNModel:unimolv1
2023-06-17 13:24:15 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success!
2023-06-17 13:24:21 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success!
2023-06-17 13:24:27 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success!
2023-06-17 13:24:33 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success!
2023-06-17 13:24:40 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success!
2023-06-17 13:24:45 | unimol/predict.py | 66 | INFO | Uni-Mol(QSAR) | final predict metrics score: 
{'mse': 0.20172259087068667, 'mae': 0.2717721822179887, 'spearmanr': 0.9094161936023004, 'rmse': 0.4491353814505006, 'r2': 0.7876258838726424}
2023-06-17 13:24:46 | unimol/data/conformer.py | 56 | INFO | Uni-Mol(QSAR) | Start generating conformers...
1841it [00:40, 45.34it/s]
2023-06-17 13:25:27 | unimol/data/conformer.py | 60 | INFO | Uni-Mol(QSAR) | Failed to generate conformers for 0.00% of molecules.
2023-06-17 13:25:27 | unimol/data/conformer.py | 62 | INFO | Uni-Mol(QSAR) | Failed to generate 3d conformers for 0.05% of molecules.
2023-06-17 13:25:28 | unimol/models/unimol.py | 107 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2023-06-17 13:25:28 | unimol/models/nnmodel.py | 145 | INFO | Uni-Mol(QSAR) | start predict NNModel:unimolv1
2023-06-17 13:25:28 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success!
2023-06-17 13:25:30 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success!
2023-06-17 13:25:32 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success!
2023-06-17 13:25:34 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success!
2023-06-17 13:25:36 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success!
2023-06-17 13:25:38 | unimol/predict.py | 66 | INFO | Uni-Mol(QSAR) | final predict metrics score: 
{'mse': 0.4197742218716444, 'mae': 0.42912608007320174, 'spearmanr': 0.7708930974024512, 'rmse': 0.6478998548168108, 'r2': 0.5847316207841755}
[Uni-Mol]	MSE:0.4198
代码
文本
[10]
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
import seaborn as sns

residuals_data = []
for name, result in results.items():
if name.startswith("Uni-Mol"):
model_residuals = pd.DataFrame({"Model": name, "Error": result["error"]})
residuals_data.append(model_residuals)

residuals_df = pd.concat(residuals_data, ignore_index=True)
residuals_df.sort_values(by="Error", ascending=True, inplace=True)
model_order = residuals_df.groupby("Model")["Error"].median().sort_values(ascending=True).index

plt.figure(figsize=(10, 7), dpi=150)
font = {'family': 'serif',
'color': 'black',
'weight': 'normal',
'size': 15}
sns.boxplot(y="Model", x="Error", data=residuals_df, order=model_order)
plt.xlabel("Abs Error", fontdict=font)
plt.ylabel("Models", fontdict=font)
plt.show()
<Figure size 1500x1050 with 1 Axes>
代码
文本

Results Overview

Finally, we can conduct a horizontal comparison of the performance of 1D-QSAR, 2D-QSAR, and 3D-QSAR with different model combinations, as well as the predictive performance of Uni-Mol on the same dataset.

代码
文本
双击即可修改
代码
文本
[27]
import pandas as pd

df = pd.DataFrame(results).T
df.sort_values(by="MSE", ascending=True, inplace=True)
df
MSE error
Uni-Mol 0.419774 [2.522239303588867, 2.0335350990295407, 2.1235...
2D-QSAR-Support Vector 0.455441 [1.6594621254469004, 1.801769913338167, 1.3386...
2D-QSAR-XGBoost 0.459129 [1.523523902893066, 1.5693136215209957, 0.7394...
2D-QSAR-Random Forest 0.47166 [1.9880250000000013, 2.382200000000001, 0.8454...
2D-QSAR-LightGBM 0.479684 [2.022284730700359, 2.591602960937026, 0.79469...
2D-QSAR-K-Nearest Neighbors 0.480645 [1.5099999999999998, 1.5079999999999973, 0.975...
1D-QSAR-Random Forest 0.605183 [2.3907239177489146, 2.4765941666666667, 3.016...
1D-QSAR-XGBoost 0.605652 [2.509926891326904, 3.200466728210449, 2.42616...
1D-QSAR-LightGBM 0.642647 [2.346929558613308, 2.5087293396443835, 2.5538...
2D-QSAR-Gradient Boosting 0.669449 [2.918999383876205, 2.653413649160223, 2.55135...
2D-QSAR-Multi-layer Perceptron 0.693308 [1.1202709345431376, 1.3843046457283936, 1.224...
2D-QSAR-Ridge Regression 0.715356 [2.78798850792775, 2.1465278084733654, 2.51336...
2D-QSAR-Linear Regression 0.715559 [2.785544528615863, 2.1429891031766406, 2.5107...
3D-QSAR-LightGBM 0.730661 [3.8540520524439525, 1.364569493019939, 1.6295...
1D-QSAR-Gradient Boosting 0.760707 [4.202714018353101, 4.082667464743396, 3.79711...
3D-QSAR-Random Forest 0.783114 [3.5546999999999995, 2.4127666666666663, 2.851...
3D-QSAR-XGBoost 0.810273 [3.825884914398193, 0.8879476547241207, 1.3634...
3D-QSAR-Gradient Boosting 0.866329 [4.33815854578517, 2.8987060129646505, 2.78310...
1D-QSAR-Ridge Regression 0.885736 [4.533467314501845, 4.120692997179958, 4.01560...
1D-QSAR-Linear Regression 0.885739 [4.533419974565081, 4.120610897612493, 4.01553...
2D-QSAR-Decision Tree 0.889239 [0.5700000000000003, 0.17999999999999972, 1.54...
1D-QSAR-K-Nearest Neighbors 0.911012 [2.523999999999999, 3.3120000000000003, 4.3239...
1D-QSAR-ElasticNet Regression 0.926934 [4.59931604048185, 4.317325121519717, 4.137556...
1D-QSAR-Lasso Regression 0.928588 [4.596617256928905, 4.317406647064892, 4.13327...
1D-QSAR-Multi-layer Perceptron 0.938524 [4.539483485335261, 4.240490901203048, 4.04913...
1D-QSAR-Support Vector 0.939777 [4.7594409976508505, 4.452918330671407, 4.2889...
2D-QSAR-ElasticNet Regression 1.010851 [4.570272986554394, 4.320272986554394, 4.09027...
2D-QSAR-Lasso Regression 1.010851 [4.570272986554394, 4.320272986554394, 4.09027...
3D-QSAR-Support Vector 1.042737 [4.750099446088549, 4.500099446088741, 4.27009...
1D-QSAR-Decision Tree 1.057852 [2.523999999999999, 2.8999999999999995, 2.3549...
3D-QSAR-K-Nearest Neighbors 1.194292 [4.92, 4.12, 5.353999999999999, 4.231999999999...
3D-QSAR-Decision Tree 1.598439 [3.0299999999999994, 1.1399999999999988, 1.659...
3D-QSAR-Lasso Regression 805.758003 [4.569499870078355, 4.319499281941863, 4.08949...
3D-QSAR-ElasticNet Regression 2390.261763 [4.56929324838699, 4.319292354357786, 4.089292...
3D-QSAR-Multi-layer Perceptron 3482168556886455484416.0 [3804401.3051103745, 3804401.5506377984, 38044...
3D-QSAR-Ridge Regression 4392658479297741235159040.0 [4.323969041648061, 3.9322019482891504, 3.6812...
3D-QSAR-Linear Regression 34953863171341550859845632.0 [4.28654773084929, 3.4351554464664877, 3.17109...
代码
文本
[28]
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
import seaborn as sns

residuals_data = []
for name, result in results.items():
if result["MSE"] > 10:
continue
model_residuals = pd.DataFrame({"Model": name, "Error": result["error"]})
residuals_data.append(model_residuals)

residuals_df = pd.concat(residuals_data, ignore_index=True)
residuals_df.sort_values(by="Error", ascending=True, inplace=True)
model_order = residuals_df.groupby("Model")["Error"].median().sort_values(ascending=True).index

plt.figure(figsize=(10, 7), dpi=150)
font = {'family': 'serif',
'color': 'black',
'weight': 'normal',
'size': 15}
sns.boxplot(y="Model", x="Error", data=residuals_df, order=model_order)
plt.xlabel("Abs Error", fontdict=font)
plt.ylabel("Models", fontdict=font)
plt.show()
<Figure size 1500x1050 with 1 Axes>
代码
文本

One More Thing

代码
文本
[39]
# Calculate the performance of each model on the test set
mse_scores = [(k, results[k]["MSE"]) for k in results.keys()]

# Sort the models based on performance metrics
mse_scores.sort(key=lambda x: x[1])

# Select the top five best-performing models
top_5_models = mse_scores[1:6]

# Output the names and performance metrics of the top five models
print("Top 5 models:")
for name, mse in top_5_models:
print(f"{name}: MSE={mse:.4f}")

# Get the predictions of the top five models
top_5_train_predictions = [train_data[f"{name}_pred"].values for name, _ in top_5_models]
top_5_test_predictions = [test_data[f"{name}_pred"].values for name, _ in top_5_models]

# Stack the predictions into a feature matrix
meta_train_x = np.column_stack(top_5_train_predictions)
meta_test_x = np.column_stack(top_5_test_predictions)

Top 5 models:
2D-QSAR-Support Vector: MSE=0.4554
2D-QSAR-XGBoost: MSE=0.4591
2D-QSAR-Random Forest: MSE=0.4717
2D-QSAR-LightGBM: MSE=0.4797
2D-QSAR-K-Nearest Neighbors: MSE=0.4806
1D-QSAR-Random Forest: MSE=0.6052
1D-QSAR-XGBoost: MSE=0.6057
1D-QSAR-LightGBM: MSE=0.6426
2D-QSAR-Gradient Boosting: MSE=0.6694
2D-QSAR-Multi-layer Perceptron: MSE=0.6933
代码
文本
[40]
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.neural_network import MLPRegressor
from xgboost import XGBRegressor
from sklearn.linear_model import Ridge, Lasso, ElasticNet
from sklearn.svm import SVR
from sklearn.ensemble import GradientBoostingRegressor
from lightgbm import LGBMRegressor
from sklearn.metrics import mean_squared_error

# Define the list of meta-learners
meta_regressors = [
("Linear Regression", LinearRegression()), # Linear regression model
("Ridge Regression", Ridge(random_state=42)), # Ridge regression model
("Lasso Regression", Lasso(random_state=42)), # Lasso regression model
("ElasticNet Regression", ElasticNet(random_state=42)), # ElasticNet regression model
("Support Vector", SVR()), # Support vector regression model
("K-Nearest Neighbors", KNeighborsRegressor()), # K-nearest neighbors regression model
("Decision Tree", DecisionTreeRegressor(random_state=42)), # Decision tree regression model
("Random Forest", RandomForestRegressor(random_state=42)), # Random forest regression model
("Gradient Boosting", GradientBoostingRegressor(random_state=42)), # Gradient boosting regression model
("XGBoost", XGBRegressor(random_state=42)), # XGBoost regression model
("LightGBM", LGBMRegressor(random_state=42)), # LightGBM regression model
("Multi-layer Perceptron", MLPRegressor( # Multi-layer perceptron (neural network) regression model
hidden_layer_sizes=(128,64,32),
learning_rate_init=0.0001,
activation='relu', solver='adam',
max_iter=10000, random_state=42)),
]

# Train meta-learners and get prediction results
for name, regressor in meta_regressors:
# Train the meta-model
regressor.fit(meta_train_x, train_y)

# Predict using the meta-model
pred_meta_train_y = regressor.predict(meta_train_x)
pred_meta_test_y = regressor.predict(meta_test_x)

# Add meta-model predictions to the training and testing data
train_data[f"META-{name}_pred"] = pred_meta_train_y
test_data[f"META-{name}_pred"] = pred_meta_test_y

# Calculate performance metrics for the test data
mse_meta = mean_squared_error(test_y, pred_meta_test_y)
se_meta = abs(test_y - pred_meta_test_y)
results[f"META-{name}"] = {"MSE": mse_meta, "error": se_meta}
print(f"[META][{name}]\tMSE:{mse_meta:.4f}")

[META][Linear Regression]	MSE:0.4761
[META][Ridge Regression]	MSE:0.4750
[META][Lasso Regression]	MSE:1.0109
[META][ElasticNet Regression]	MSE:0.6884
[META][Support Vector]	MSE:0.4874
[META][K-Nearest Neighbors]	MSE:0.4511
[META][Decision Tree]	MSE:0.4866
[META][Random Forest]	MSE:0.4824
[META][Gradient Boosting]	MSE:0.4765
[META][XGBoost]	MSE:0.4829
[META][LightGBM]	MSE:0.4774
[META][Multi-layer Perceptron]	MSE:0.4604
代码
文本
[47]
# Calculate the performance of each meta-learner on the test set
mse_scores = []
for name, regressor in meta_regressors:
pred_test_y = regressor.predict(meta_test_x)
mse = mean_squared_error(test_y, pred_test_y)
mse_scores.append((name, mse))

# Sort the meta-learners based on performance metrics
mse_scores.sort(key=lambda x: x[1])

# Select the top five best-performing meta-learners
top_5_regressors = mse_scores[:5]

# Initialize variables for storing the average prediction results
pred_meta_train_y_top5_avg = np.zeros_like(train_y)
pred_meta_test_y_top5_avg = np.zeros_like(test_y)

# Calculate the average predictions of the top five meta-learners
for name, _ in top_5_regressors:
regressor = dict(meta_regressors)[name]
pred_meta_train_y_top5_avg += regressor.predict(meta_train_x)
pred_meta_test_y_top5_avg += regressor.predict(meta_test_x)
pred_meta_train_y_top5_avg /= len(top_5_regressors)
pred_meta_test_y_top5_avg /= len(top_5_regressors)

# Add the average prediction results to the training and testing data
train_data["Top5_Meta_pred"] = pred_meta_train_y_top5_avg
test_data["Top5_Meta_pred"] = pred_meta_test_y_top5_avg

# Calculate performance metrics for the average prediction results
mse_top5_meta = mean_squared_error(test_y, pred_meta_test_y_top5_avg)
se_top5_meta = abs(test_y - pred_meta_test_y_top5_avg)
results["Top5_Meta"] = {"MSE": mse_top5_meta, "error": se_top5_meta}
print(f"[Top 5 Meta Model]\tMSE:{mse_top5_meta:.4f}")

# Compare the average prediction results with the performance of individual meta-learners
for name in [name for name, _ in top_5_regressors]:
mse_single = results[f"META-{name}"]["MSE"]
performance_gain = mse_single - mse_top5_meta
print(f"[Ensemble][{name} vs. Top5_Meta]\tPerformance Gain (MSE): {performance_gain:.4f}")

[Top 5 Meta Model]	MSE:0.4656
[Ensemble][K-Nearest Neighbors vs. Top5_Meta]	Performance Gain (MSE): -0.0145
[Ensemble][Multi-layer Perceptron vs. Top5_Meta]	Performance Gain (MSE): -0.0052
[Ensemble][Ridge Regression vs. Top5_Meta]	Performance Gain (MSE): 0.0094
[Ensemble][Linear Regression vs. Top5_Meta]	Performance Gain (MSE): 0.0105
[Ensemble][Gradient Boosting vs. Top5_Meta]	Performance Gain (MSE): 0.0110
代码
文本
Deep Learning
RDKit
QSAR
Tutorial
Machine Learning
Uni-Mol
notebook
Scikit-Learn
Deep LearningRDKitQSARTutorialMachine LearningUni-MolnotebookScikit-Learn
点个赞吧
推荐阅读
公开
Quantitative Structure-Activity Relationship (QSAR) Model from 0 to 1 (Classification Tasks)
Uni-Mol
Uni-Mol
Yani Guan
更新于 2024-10-24
公开
定量构效关系(QSAR)模型从0到1 & Uni-Mol入门实践(回归任务)
pythonUni-Mol QSAR
pythonUni-Mol QSAR
Letian
更新于 2024-08-27
1 赞
{/**/}