Quantitative Structure-Activity Relationship (QSAR) Model from 0 to 1 & Uni-Mol Introductory Practice (Regression Task)
©️ Copyright 2023 @ Authors
Author:
Hang Zheng 📨
Date: 2023-06-16
Sharing Agreement: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Quick Start: Click the Start Connection button above, select the unimol-qsar:v0.2 image and any GPU node configuration, and wait a moment to run.
In recent years, Artificial Intelligence (AI) has been developing at an unprecedented speed, bringing significant breakthroughs and transformations to various fields.
In fact, in the field of drug development, drug scientists have been using a series of mathematical and statistical methods to aid the drug development process since the last century. Based on the structure of drug molecules, they construct mathematical simulations to predict the biochemical activity of drugs. This method is known as Quantitative Structure-Activity Relationship (QSAR). QSAR models have continued to evolve with the deepening research on drug molecules and the introduction of more AI methods.
It can be said that QSAR models are a good microcosm of the development of the AI for Science field. In this Notebook, we will introduce the construction methods of different types of QSAR models in the form of case studies.
Introduction
Quantitative Structure-Activity Relationship (QSAR) is a method that studies the quantitative relationship between the chemical structure of compounds and their biological activity. It is one of the most important tools in Computer-Aided Drug Design (CADD). QSAR aims to establish mathematical models to relate molecular structures with their biochemical and physicochemical properties, helping drug scientists to make rational predictions about the properties of new drug molecules.
Building an effective QSAR model involves several steps:
- Constructing a reasonable molecular representation, which converts molecular structures into computer-readable numerical representations;
- Selecting a suitable machine learning model for the molecular representation and using existing molecule-property data to train the model;
- Using the trained machine learning model to predict the properties of molecules with unknown properties.
The development of QSAR models has evolved with the progression of molecular representation techniques and the corresponding upgrades in machine learning models. In this notebook, we will introduce the construction methods of different types of QSAR models through case studies.
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Collecting seaborn Downloading seaborn-0.12.2-py3-none-any.whl (293 kB) |████████████████████████████████| 293 kB 338 kB/s eta 0:00:01 Requirement already satisfied: matplotlib!=3.6.1,>=3.1 in /opt/conda/lib/python3.8/site-packages (from seaborn) (3.7.1) Requirement already satisfied: numpy!=1.24.0,>=1.17 in /opt/conda/lib/python3.8/site-packages (from seaborn) (1.20.3) Requirement already satisfied: pandas>=0.25 in /opt/conda/lib/python3.8/site-packages (from seaborn) (1.5.3) Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.8/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (0.11.0) Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.8/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (1.4.4) Requirement already satisfied: pillow>=6.2.0 in /opt/conda/lib/python3.8/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (9.5.0) Requirement already satisfied: contourpy>=1.0.1 in /opt/conda/lib/python3.8/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (1.0.7) Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/lib/python3.8/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (4.39.4) Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.8/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (23.1) Requirement already satisfied: python-dateutil>=2.7 in /opt/conda/lib/python3.8/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (2.8.2) Requirement already satisfied: pyparsing>=2.3.1 in /opt/conda/lib/python3.8/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (3.0.9) Requirement already satisfied: importlib-resources>=3.2.0 in /opt/conda/lib/python3.8/site-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (5.12.0) Requirement already satisfied: zipp>=3.1.0 in /opt/conda/lib/python3.8/site-packages (from importlib-resources>=3.2.0->matplotlib!=3.6.1,>=3.1->seaborn) (3.15.0) Requirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.8/site-packages (from pandas>=0.25->seaborn) (2023.3) Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib!=3.6.1,>=3.1->seaborn) (1.16.0) Installing collected packages: seaborn Successfully installed seaborn-0.12.2 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Collecting lightgbm Downloading lightgbm-3.3.5-py3-none-manylinux1_x86_64.whl (2.0 MB) |████████████████████████████████| 2.0 MB 319 kB/s eta 0:00:01 Requirement already satisfied: scikit-learn!=0.22.0 in /opt/conda/lib/python3.8/site-packages (from lightgbm) (0.24.2) Requirement already satisfied: wheel in /opt/conda/lib/python3.8/site-packages (from lightgbm) (0.40.0) Requirement already satisfied: numpy in /opt/conda/lib/python3.8/site-packages (from lightgbm) (1.20.3) Requirement already satisfied: scipy in /opt/conda/lib/python3.8/site-packages (from lightgbm) (1.6.3) Requirement already satisfied: joblib>=0.11 in /opt/conda/lib/python3.8/site-packages (from scikit-learn!=0.22.0->lightgbm) (1.1.0) Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.8/site-packages (from scikit-learn!=0.22.0->lightgbm) (3.1.0) Installing collected packages: lightgbm Successfully installed lightgbm-3.3.5 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv --2023-06-17 12:34:08-- https://dp-public.oss-cn-beijing.aliyuncs.com/community/hERG.csv Resolving ga.dp.tech (ga.dp.tech)... 10.255.255.41 Connecting to ga.dp.tech (ga.dp.tech)|10.255.255.41|:8118... connected. Proxy request sent, awaiting response... 200 OK Length: 560684 (548K) [text/csv] Saving to: ‘datasets/hERG.csv’ datasets/hERG.csv 100%[===================>] 547.54K --.-KB/s in 0.08s 2023-06-17 12:34:09 (6.55 MB/s) - ‘datasets/hERG.csv’ saved [560684/560684]
Then, we can take a look at the composition of this dataset:
------------ Original data ------------ SMILES pIC50 0 Cc1ccc(CN2[C@@H]3CC[C@H]2C[C@@H](C3)Oc4cccc(c4... 9.85 1 COc1nc2ccc(Br)cc2cc1[C@@H](c3ccccc3)[C@@](O)(C... 9.70 2 NC(=O)c1cccc(O[C@@H]2C[C@H]3CC[C@@H](C2)N3CCCc... 9.60 3 CCCCCCCc1cccc([n+]1C)CCCCCCC 9.60 4 Cc1ccc(CN2[C@@H]3CC[C@H]2C[C@@H](C3)Oc4cccc(c4... 9.59 ... ... ... 9199 O=C1[C@H]2N(c3ccc(OCC=CCCNCC(=O)Nc4c(Cl)cc(cc4... 4.89 9200 O=C1[C@H]2N(c3ccc(OCCCCCNCC(=O)Nc4c(Cl)cc(cc4C... 4.89 9201 O=C1[C@H]2N(c3ccc(OCC=CCCCNCC(=O)Nc4c(Cl)cc(cc... 4.89 9202 O=C1[C@H]2N(c3ccc(OCCCCCCNCC(=O)Nc4c(Cl)cc(cc4... 4.49 9203 O=C1N=C/C(=C2\N(c3c(cc(Cl)c(Cl)c3)N\2)Cc4cc(Cl... 5.30 [9204 rows x 2 columns]
<Figure size 900x600 with 1 Axes>
You can see that in the hERG dataset:
- Molecules are represented by SMILES strings;
- The task objective is a regression prediction task, predicting the inhibitory activity of molecules on proteins, represented by pIC50.
This is a common molecular property prediction task. Alright, let's put this dataset aside for now. Next, let's officially start exploring.
A Brief History of QSAR
Quantitative Structure-Activity Relationship (QSAR) is a method that studies the quantitative relationship between the chemical structure of compounds and their biological activity. It is one of the most important tools in Computer-Aided Drug Design (CADD). QSAR aims to establish mathematical models to relate molecular structures with their biochemical and physicochemical properties, helping drug scientists to make rational predictions about the properties of new drug molecules.
QSAR evolved from Structure-Activity Relationship (SAR) analysis. The origins of SAR can be traced back to the late 19th century when chemists began studying the relationship between compound structures and biological activity. German chemist Paul Ehrlich (1854-1915) proposed the "lock-and-key" hypothesis, suggesting that the interaction between compounds (keys) and biological targets (locks) depends on their spatial matching. As scientists deepened their understanding of molecular interactions, they realized that besides spatial matching, the properties of the target surface (e.g., hydrophobicity, electrophilicity) and the corresponding properties of the ligand structure were also crucial. This led to the development of a series of methods to evaluate the structural characteristics and binding affinity, known as Structure-Activity Relationships.
However, the SAR method mainly relied on the experience and intuitive judgment of chemists, lacking a rigorous theoretical foundation and unified analytical approach. To overcome these limitations, scientists began using mathematical and statistical methods in the 1960s to conduct quantitative analysis of the relationship between molecular structure and biological activity.
The earliest proposed QSAR model can be traced back to 1868, when chemist Alexander Crum Brown and physiologist Thomas R. Fraser began studying the relationship between compound structure and biological activity. In their research on the biological effects before and after methylation of the basic nitrogen atoms in alkaloids, they proposed that the physiological activity of a compound depends on the composition of its components, expressed as biological activity being a function of the compound composition : . This is known as the Crum-Brown Equation, laying the foundation for future QSAR research.
Subsequently, various QSAR models were proposed in academia, such as the QSAR model linking organic compound toxicity to molecular electronics introduced by Hammett, and the steric parameter model proposed by Taft. In 1964, Hansch and Fujita introduced the well-known Hansch model, which suggested that a molecule's biological activity is mainly determined by its hydrophobic effect (), steric effect (), and electronic effect (), and assumed that these three effects can be independently additive. The complete form of the model is: . The Hansch model was the first to quantitatively describe the relationship between chemical information and drug biological activity, providing a practical theoretical framework for subsequent QSAR research. It is considered a crucial milestone in the transition from blind drug design to rational drug design.
Today, QSAR has developed into a mature research field involving various computational methods and techniques. In recent years, with the rapid development of machine learning and artificial intelligence technologies, QSAR methods have been further expanded and applied. For example, deep learning techniques have been used to build QSAR models, enhancing their predictive capabilities and accuracy. Furthermore, QSAR methods have found broad applications in fields such as environmental science and materials science, demonstrating strong potential and a wide range of application prospects.
Basic Requirements for QSAR Modeling
At an international conference held in Setubal, Portugal, in 2002, scientists proposed several rules regarding the validity of QSAR models, known as the "Setubal Principles." These rules were further refined in November 2004 and officially named the "OECD Principles." For a QSAR model to be used for regulatory purposes, it should meet the following 5 conditions:
- A defined endpoint
- An unambiguous algorithm
- A defined domain of applicability
- Appropriate measures of goodness-of-fit, robustness, and predictivity
- A mechanistic interpretation, if possible
Basic Workflow of QSAR Modeling
Building an effective QSAR model mainly involves three steps:
- Constructing a reasonable molecular representation, which converts molecular structures into computer-readable numerical representations;
- Selecting a suitable machine learning model for the molecular representation and using existing molecule-property data to train the model;
- Using the trained machine learning model to predict the properties of molecules with unknown properties.
Since molecular structures are not in a computer-readable format, we must first convert them into numerical vectors that can be read by computers. This allows for the selection of appropriate mathematical models based on these representations. We call this process molecular representation. Effective molecular representation and the choice of compatible mathematical models are the core of building quantitative structure-activity relationship models.
Molecular Representation
Molecular representation is a numerical depiction that includes molecular properties. Common molecular representation methods include molecular descriptors, fingerprints, SMILES strings, and molecular potential functions.
Wei, J., Chu, X., Sun, X. Y., Xu, K., Deng, H. X., Chen, J., ... & Lei, M. (2019). Machine learning in materials science. InfoMat, 1(3), 338-358.
In fact, the development of QSAR has evolved along with the increasing information content and changing forms of molecular representations, leading to the classification of QSAR models into 1D-QSAR, 2D-QSAR, and 3D-QSAR:
Different molecular representations have distinct numerical characteristics, requiring different machine learning/deep learning models for modeling. Next, we will demonstrate how to build 1D-QSAR, 2D-QSAR, and 3D-QSAR models through practical examples.
1D-QSAR Molecular Representation
Early quantitative structure-activity relationship models mostly used physicochemical properties of molecules, such as molecular weight, water solubility, and molecular surface area, as the method of representation. These physicochemical properties are known as molecular descriptors. This defines the 1D-QSAR stage.
At this stage, experienced scientists often rely on their domain knowledge to design molecular descriptors, constructing properties that may be related to the characteristic being studied. For example, if the goal is to predict whether a drug can pass through the blood-brain barrier, this property may be related to the drug's water solubility, molecular weight, polar surface area, and other physicochemical attributes. Scientists would include such attributes in the molecular descriptors.
During this period, due to limited access to computers or insufficient computational power, scientists often used simple mathematical models for modeling, such as linear regression and random forests. Since molecular representations constructed from descriptors are typically low-dimensional real-valued vectors, these mathematical models are well-suited for this kind of work.
[[464.87300000000016, 2.5531800000000002, 1, 10, 140.73000000000002, 8, 4, 0, 0, 12, 166, 0, 0.4159359067517256]]
[1D-QSAR][Linear Regression] MSE:0.8857 [1D-QSAR][Ridge Regression] MSE:0.8857 [1D-QSAR][Lasso Regression] MSE:0.9286 [1D-QSAR][ElasticNet Regression] MSE:0.9269 [1D-QSAR][Support Vector] MSE:0.9398 [1D-QSAR][K-Nearest Neighbors] MSE:0.9110 [1D-QSAR][Decision Tree] MSE:1.0579 [1D-QSAR][Random Forest] MSE:0.6052 [1D-QSAR][Gradient Boosting] MSE:0.7607 [1D-QSAR][XGBoost] MSE:0.6057 [1D-QSAR][LightGBM] MSE:0.6426 [1D-QSAR][Multi-layer Perceptron] MSE:0.9385
<Figure size 1500x1050 with 1 Axes>
2D-QSAR Molecular Characterization
However, when facing the challenge of predicting molecular properties with unclear biochemical mechanisms, scientists may find it difficult to design effective molecular descriptors to characterize molecules, leading to the failure of QSAR model construction. Since molecular properties are largely determined by molecular structure, such as the functional groups present on the molecule, there is an interest in incorporating the bonding relationships of molecules into QSAR modeling. Thus, the field has entered the stage of 2D-QSAR.
One of the earlier proposed methods is the molecular fingerprint method, such as Morgan fingerprints, which characterizes molecules by traversing the bonding relationships of each atom and its surrounding atoms. To meet the requirement that molecules of different sizes can be represented by numerical vectors of the same length, molecular fingerprints often use hashing operations to ensure uniform vector length, resulting in high-dimensional 0/1 vectors. In this scenario, scientists typically choose machine learning methods that handle high-dimensional sparse vectors well, such as support vector machines and fully connected neural networks, for model construction.
With the development of AI models, deep learning models capable of handling sequence data (e.g., text) like Recurrent Neural Networks (RNN), image data like Convolutional Neural Networks (CNN), and unstructured graph data like Graph Neural Networks (GNN) have been proposed and applied. QSAR models have also been constructed to fit molecular representations based on the data characteristics these models can handle. For example, SMILES string representations of molecules have been applied in RNN modeling, 2D images of molecules in CNN modeling, and the bonding topological structure of molecules converted into graphs in GNN modeling, leading to the development of a series of QSAR modeling methods.
Overall, in the 2D-QSAR stage, various methods are utilized to analyze the bonding relationships (topological structure) of molecules to model and predict molecular properties.
[array([0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1])]
[2D-QSAR][Linear Regression] MSE:0.7156 [2D-QSAR][Ridge Regression] MSE:0.7154 [2D-QSAR][Lasso Regression] MSE:1.0109 [2D-QSAR][ElasticNet Regression] MSE:1.0109 [2D-QSAR][Support Vector] MSE:0.4554 [2D-QSAR][K-Nearest Neighbors] MSE:0.4806 [2D-QSAR][Decision Tree] MSE:0.8892 [2D-QSAR][Random Forest] MSE:0.4717 [2D-QSAR][Gradient Boosting] MSE:0.6694 [2D-QSAR][XGBoost] MSE:0.4591 [2D-QSAR][LightGBM] MSE:0.4797 [2D-QSAR][Multi-layer Perceptron] MSE:0.6933
<Figure size 1500x1050 with 1 Axes>
3D-QSAR Molecular Characterization
However, due to the presence of intermolecular and intramolecular interactions, molecules with similar topological structures may adopt different conformations in various environments. The conformation of each molecule in different environments and the corresponding energy levels determine the true nature of the molecule. Therefore, scientists aim to incorporate the three-dimensional structure of molecules into QSAR modeling to enhance the ability to predict molecular properties in specific scenarios. This stage is referred to as the 3D-QSAR stage.
The Comparative Molecular Field Analysis (CoFMA) is a widely used 3D-QSAR model. It calculates the forces (i.e., force fields) at various positions in the space where the molecule exists (usually by selecting positions through a grid method) to characterize the three-dimensional structure of the molecule. Of course, there are other beneficial attempts in the field, including characterization methods through electron density, three-dimensional molecular images, or adding geometric information to molecular graphs.
To handle such high-dimensional spatial information, scientists often choose deep learning methods such as deeper FCNN, 3D-CNN, GNN, etc., for modeling.
length: 10000
We can see that 3D-QSAR will construct very long molecular representations. Therefore, we first perform dimensionality reduction on this molecular representation using PCA.
[3D-QSAR][Linear Regression] MSE:34953863171341550859845632.0000 [3D-QSAR][Ridge Regression] MSE:4392658479297741235159040.0000 [3D-QSAR][Lasso Regression] MSE:805.7580 [3D-QSAR][ElasticNet Regression] MSE:2390.2618 [3D-QSAR][Support Vector] MSE:1.0427 [3D-QSAR][K-Nearest Neighbors] MSE:1.1943 [3D-QSAR][Decision Tree] MSE:1.5984 [3D-QSAR][Random Forest] MSE:0.7831 [3D-QSAR][Gradient Boosting] MSE:0.8663 [3D-QSAR][XGBoost] MSE:0.8103 [3D-QSAR][LightGBM] MSE:0.7307 [3D-QSAR][Multi-layer Perceptron] MSE:3482168556886455484416.0000
<Figure size 1500x1050 with 1 Axes>
Uni-Mol Molecular Representation Learning and Pretraining Framework
Pretraining Model
One of the main challenges in QSAR modeling within the field of drug development is the limited amount of data. Due to the high cost and experimental difficulty of obtaining drug activity data, there is often a lack of labeled data. Insufficient data affects the model's predictive ability, as it may be difficult for the model to capture enough information to describe the relationship between compound structure and biological activity.
Faced with this situation of insufficient labeled data, the pretrain-finetune approach has become a common solution in more mature fields of machine learning, such as natural language processing (NLP) and computer vision (CV). Pretraining involves training the model on a large amount of unlabeled data through self-supervised learning, allowing the model to gain basic information and general capabilities. The model is then fine-tuned on a smaller set of labeled data through supervised learning to equip it with specific problem-solving abilities.
For example, if I want to perform image recognition of cats and dogs but lack sufficient labeled data, I can first pretrain the model using a large set of unlabeled images, enabling it to learn basic concepts of lines, shapes, and contours. Afterward, I can use supervised learning with cat and dog images, allowing the model to quickly learn to distinguish between cats and dogs based on contour information.
The pretraining approach can effectively utilize large amounts of easily accessible unlabeled data to improve the model's generalization ability and predictive performance. In QSAR modeling, we can also leverage the concept of pretraining to address issues related to data quantity and quality.
Introduction to Uni-Mol
Uni-Mol is a universal molecular representation learning framework based on 3D molecular structures, released by DeepModeling in May 2022. Uni-Mol takes 3D molecular structures as model input and uses around 200 million small molecule conformations and 3 million protein surface cavity structures. It pretrains the model using two self-supervised tasks: atom type restoration and atom coordinate restoration.
Uni-Mol Paper: https://openreview.net/forum?id=6K2RM6wVqKu
Open-source Code: https://github.com/dptech-corp/Uni-Mol
The representation learning from 3D information and the effective pretraining approach allow Uni-Mol to outperform SOTA (state of the art) models in almost all downstream tasks related to drug molecules and protein pockets. Uni-Mol can directly handle tasks such as molecular conformation generation and protein-ligand binding pose prediction, surpassing existing solutions. The paper was accepted at the top machine learning conference ICLR 2023.
Next, we will use Uni-Mol to build a BACE-1 molecular activity prediction task:
2023-06-17 12:35:53 | unimol/data/datareader.py | 138 | INFO | Uni-Mol(QSAR) | Anomaly clean with 3 sigma threshold: 7363 -> 7232 2023-06-17 12:35:55 | unimol/data/conformer.py | 56 | INFO | Uni-Mol(QSAR) | Start generating conformers... 7232it [02:54, 41.48it/s] 2023-06-17 12:38:49 | unimol/data/conformer.py | 60 | INFO | Uni-Mol(QSAR) | Failed to generate conformers for 0.01% of molecules. 2023-06-17 12:38:49 | unimol/data/conformer.py | 62 | INFO | Uni-Mol(QSAR) | Failed to generate 3d conformers for 0.07% of molecules. 2023-06-17 12:38:49 | unimol/train.py | 86 | INFO | Uni-Mol(QSAR) | Output directory already exists: ./exp_reg_hERG_0616 2023-06-17 12:38:49 | unimol/train.py | 87 | INFO | Uni-Mol(QSAR) | Warning: Overwrite output directory: ./exp_reg_hERG_0616 2023-06-17 12:38:50 | unimol/models/unimol.py | 107 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2023-06-17 12:38:50 | unimol/models/nnmodel.py | 100 | INFO | Uni-Mol(QSAR) | start training Uni-Mol:unimolv1 2023-06-17 12:39:07 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [1/50] train_loss: 1.0198, val_loss: 0.9594, val_mse: 0.6890, lr: 0.000067, 14.3s 2023-06-17 12:39:16 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [2/50] train_loss: 0.9963, val_loss: 0.9689, val_mse: 0.7027, lr: 0.000099, 8.2s 2023-06-17 12:39:25 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [3/50] train_loss: 0.9583, val_loss: 1.0292, val_mse: 0.7534, lr: 0.000097, 8.1s 2023-06-17 12:39:33 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [4/50] train_loss: 0.9276, val_loss: 0.9263, val_mse: 0.6723, lr: 0.000095, 8.0s 2023-06-17 12:39:41 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [5/50] train_loss: 0.9060, val_loss: 0.8094, val_mse: 0.5913, lr: 0.000093, 8.1s 2023-06-17 12:39:50 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [6/50] train_loss: 0.8745, val_loss: 0.7400, val_mse: 0.5398, lr: 0.000091, 8.1s 2023-06-17 12:40:00 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [7/50] train_loss: 0.8458, val_loss: 0.7638, val_mse: 0.5611, lr: 0.000089, 8.9s 2023-06-17 12:40:08 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [8/50] train_loss: 0.8492, val_loss: 0.7316, val_mse: 0.5450, lr: 0.000087, 8.1s 2023-06-17 12:40:16 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [9/50] train_loss: 0.8398, val_loss: 0.6793, val_mse: 0.5067, lr: 0.000085, 8.0s 2023-06-17 12:40:24 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [10/50] train_loss: 0.8264, val_loss: 0.7524, val_mse: 0.5610, lr: 0.000082, 8.1s 2023-06-17 12:40:32 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [11/50] train_loss: 0.8191, val_loss: 0.6767, val_mse: 0.4993, lr: 0.000080, 8.2s 2023-06-17 12:40:41 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [12/50] train_loss: 0.8035, val_loss: 0.6709, val_mse: 0.4910, lr: 0.000078, 8.1s 2023-06-17 12:40:50 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [13/50] train_loss: 0.7891, val_loss: 0.6657, val_mse: 0.4940, lr: 0.000076, 8.2s 2023-06-17 12:40:58 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [14/50] train_loss: 0.7720, val_loss: 0.6442, val_mse: 0.4793, lr: 0.000074, 8.2s 2023-06-17 12:41:07 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [15/50] train_loss: 0.7678, val_loss: 0.6312, val_mse: 0.4662, lr: 0.000072, 8.3s 2023-06-17 12:41:16 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [16/50] train_loss: 0.7501, val_loss: 0.6796, val_mse: 0.5068, lr: 0.000070, 8.2s 2023-06-17 12:41:24 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [17/50] train_loss: 0.7714, val_loss: 0.5719, val_mse: 0.4233, lr: 0.000068, 8.2s 2023-06-17 12:41:33 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [18/50] train_loss: 0.7501, val_loss: 0.6096, val_mse: 0.4527, lr: 0.000066, 8.1s 2023-06-17 12:41:41 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [19/50] train_loss: 0.7531, val_loss: 0.7351, val_mse: 0.5461, lr: 0.000064, 8.1s 2023-06-17 12:41:49 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [20/50] train_loss: 0.7357, val_loss: 0.5855, val_mse: 0.4357, lr: 0.000062, 8.1s 2023-06-17 12:41:58 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [21/50] train_loss: 0.7334, val_loss: 0.5762, val_mse: 0.4231, lr: 0.000060, 8.2s 2023-06-17 12:42:07 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [22/50] train_loss: 0.7062, val_loss: 0.5763, val_mse: 0.4312, lr: 0.000058, 8.1s 2023-06-17 12:42:15 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [23/50] train_loss: 0.7371, val_loss: 0.5740, val_mse: 0.4278, lr: 0.000056, 8.2s 2023-06-17 12:42:23 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [24/50] train_loss: 0.7131, val_loss: 0.6085, val_mse: 0.4584, lr: 0.000054, 8.1s 2023-06-17 12:42:31 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [25/50] train_loss: 0.7075, val_loss: 0.5816, val_mse: 0.4340, lr: 0.000052, 8.1s 2023-06-17 12:42:39 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [26/50] train_loss: 0.6932, val_loss: 0.5505, val_mse: 0.4123, lr: 0.000049, 8.0s 2023-06-17 12:42:49 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [27/50] train_loss: 0.7009, val_loss: 0.8499, val_mse: 0.6284, lr: 0.000047, 8.2s 2023-06-17 12:42:57 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [28/50] train_loss: 0.6985, val_loss: 0.6224, val_mse: 0.4643, lr: 0.000045, 8.1s 2023-06-17 12:43:05 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [29/50] train_loss: 0.6649, val_loss: 0.6121, val_mse: 0.4566, lr: 0.000043, 8.3s 2023-06-17 12:43:13 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [30/50] train_loss: 0.6705, val_loss: 0.5974, val_mse: 0.4445, lr: 0.000041, 8.0s 2023-06-17 12:43:21 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [31/50] train_loss: 0.6639, val_loss: 0.6243, val_mse: 0.4603, lr: 0.000039, 8.1s 2023-06-17 12:43:29 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [32/50] train_loss: 0.6774, val_loss: 0.5461, val_mse: 0.4065, lr: 0.000037, 8.0s 2023-06-17 12:43:38 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [33/50] train_loss: 0.6568, val_loss: 0.5854, val_mse: 0.4339, lr: 0.000035, 8.1s 2023-06-17 12:43:46 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [34/50] train_loss: 0.6483, val_loss: 0.6151, val_mse: 0.4625, lr: 0.000033, 8.1s 2023-06-17 12:43:54 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [35/50] train_loss: 0.6560, val_loss: 0.5742, val_mse: 0.4311, lr: 0.000031, 8.1s 2023-06-17 12:44:03 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [36/50] train_loss: 0.6448, val_loss: 0.6001, val_mse: 0.4513, lr: 0.000029, 8.6s 2023-06-17 12:44:11 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [37/50] train_loss: 0.6241, val_loss: 0.6042, val_mse: 0.4489, lr: 0.000027, 8.2s 2023-06-17 12:44:19 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [38/50] train_loss: 0.6224, val_loss: 0.6076, val_mse: 0.4564, lr: 0.000025, 8.1s 2023-06-17 12:44:27 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [39/50] train_loss: 0.6247, val_loss: 0.5829, val_mse: 0.4303, lr: 0.000023, 8.2s 2023-06-17 12:44:36 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [40/50] train_loss: 0.6328, val_loss: 0.5770, val_mse: 0.4261, lr: 0.000021, 8.2s 2023-06-17 12:44:44 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [41/50] train_loss: 0.6317, val_loss: 0.5813, val_mse: 0.4320, lr: 0.000019, 8.1s 2023-06-17 12:44:52 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [42/50] train_loss: 0.6271, val_loss: 0.6083, val_mse: 0.4543, lr: 0.000016, 8.1s 2023-06-17 12:44:52 | unimol/utils/metrics.py | 255 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 42 2023-06-17 12:44:53 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success! 2023-06-17 12:44:54 | unimol/models/nnmodel.py | 123 | INFO | Uni-Mol(QSAR) | fold 0, result {'mse': 0.40648797, 'mae': 0.4646326, 'spearmanr': 0.685573097620505, 'rmse': 0.63756406, 'r2': 0.44125538296758327} 2023-06-17 12:44:55 | unimol/models/unimol.py | 107 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2023-06-17 12:45:03 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [1/50] train_loss: 1.0218, val_loss: 1.0607, val_mse: 0.7615, lr: 0.000067, 8.2s 2023-06-17 12:45:12 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [2/50] train_loss: 1.0089, val_loss: 0.9100, val_mse: 0.6544, lr: 0.000099, 8.1s 2023-06-17 12:45:21 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [3/50] train_loss: 0.9560, val_loss: 0.8490, val_mse: 0.6177, lr: 0.000097, 8.1s 2023-06-17 12:45:30 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [4/50] train_loss: 0.9313, val_loss: 0.8471, val_mse: 0.6134, lr: 0.000095, 8.1s 2023-06-17 12:45:39 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [5/50] train_loss: 0.9388, val_loss: 0.7970, val_mse: 0.5788, lr: 0.000093, 8.2s 2023-06-17 12:45:48 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [6/50] train_loss: 0.8808, val_loss: 0.7646, val_mse: 0.5524, lr: 0.000091, 8.1s 2023-06-17 12:45:57 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [7/50] train_loss: 0.8609, val_loss: 0.8060, val_mse: 0.5749, lr: 0.000089, 8.1s 2023-06-17 12:46:06 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [8/50] train_loss: 0.8700, val_loss: 0.7409, val_mse: 0.5293, lr: 0.000087, 8.1s 2023-06-17 12:46:15 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [9/50] train_loss: 0.8086, val_loss: 0.7425, val_mse: 0.5365, lr: 0.000085, 8.1s 2023-06-17 12:46:23 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [10/50] train_loss: 0.8355, val_loss: 0.8126, val_mse: 0.5832, lr: 0.000082, 8.2s 2023-06-17 12:46:31 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [11/50] train_loss: 0.8044, val_loss: 0.6809, val_mse: 0.4987, lr: 0.000080, 8.2s 2023-06-17 12:46:40 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [12/50] train_loss: 0.7771, val_loss: 0.6235, val_mse: 0.4526, lr: 0.000078, 8.2s 2023-06-17 12:46:49 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [13/50] train_loss: 0.7960, val_loss: 0.6240, val_mse: 0.4551, lr: 0.000076, 8.1s 2023-06-17 12:46:58 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [14/50] train_loss: 0.7717, val_loss: 0.6819, val_mse: 0.5010, lr: 0.000074, 8.2s 2023-06-17 12:47:06 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [15/50] train_loss: 0.7460, val_loss: 0.6513, val_mse: 0.4798, lr: 0.000072, 8.0s 2023-06-17 12:47:14 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [16/50] train_loss: 0.7331, val_loss: 0.6302, val_mse: 0.4661, lr: 0.000070, 8.2s 2023-06-17 12:47:22 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [17/50] train_loss: 0.7535, val_loss: 0.5941, val_mse: 0.4365, lr: 0.000068, 8.2s 2023-06-17 12:47:31 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [18/50] train_loss: 0.7380, val_loss: 0.5936, val_mse: 0.4342, lr: 0.000066, 8.1s 2023-06-17 12:47:40 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [19/50] train_loss: 0.7061, val_loss: 0.6066, val_mse: 0.4422, lr: 0.000064, 8.2s 2023-06-17 12:47:48 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [20/50] train_loss: 0.7326, val_loss: 0.6399, val_mse: 0.4771, lr: 0.000062, 8.2s 2023-06-17 12:47:57 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [21/50] train_loss: 0.7176, val_loss: 0.6288, val_mse: 0.4616, lr: 0.000060, 9.0s 2023-06-17 12:48:06 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [22/50] train_loss: 0.7048, val_loss: 0.6277, val_mse: 0.4632, lr: 0.000058, 9.2s 2023-06-17 12:48:15 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [23/50] train_loss: 0.6901, val_loss: 0.5978, val_mse: 0.4354, lr: 0.000056, 9.1s 2023-06-17 12:48:23 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [24/50] train_loss: 0.6980, val_loss: 0.5127, val_mse: 0.3796, lr: 0.000054, 8.3s 2023-06-17 12:48:33 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [25/50] train_loss: 0.6828, val_loss: 0.6137, val_mse: 0.4506, lr: 0.000052, 8.1s 2023-06-17 12:48:41 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [26/50] train_loss: 0.6848, val_loss: 0.5487, val_mse: 0.4038, lr: 0.000049, 8.1s 2023-06-17 12:48:49 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [27/50] train_loss: 0.6755, val_loss: 0.5651, val_mse: 0.4137, lr: 0.000047, 8.2s 2023-06-17 12:48:58 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [28/50] train_loss: 0.6721, val_loss: 0.5640, val_mse: 0.4132, lr: 0.000045, 8.2s 2023-06-17 12:49:06 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [29/50] train_loss: 0.6658, val_loss: 0.5870, val_mse: 0.4300, lr: 0.000043, 8.1s 2023-06-17 12:49:14 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [30/50] train_loss: 0.6704, val_loss: 0.5301, val_mse: 0.3882, lr: 0.000041, 8.2s 2023-06-17 12:49:22 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [31/50] train_loss: 0.6458, val_loss: 0.5127, val_mse: 0.3740, lr: 0.000039, 8.1s 2023-06-17 12:49:31 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [32/50] train_loss: 0.6604, val_loss: 0.5854, val_mse: 0.4273, lr: 0.000037, 8.1s 2023-06-17 12:49:39 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [33/50] train_loss: 0.6626, val_loss: 0.6108, val_mse: 0.4480, lr: 0.000035, 8.2s 2023-06-17 12:49:47 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [34/50] train_loss: 0.6247, val_loss: 0.5063, val_mse: 0.3733, lr: 0.000033, 8.2s 2023-06-17 12:49:56 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [35/50] train_loss: 0.6381, val_loss: 0.5741, val_mse: 0.4197, lr: 0.000031, 8.1s 2023-06-17 12:50:05 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [36/50] train_loss: 0.6286, val_loss: 0.5272, val_mse: 0.3871, lr: 0.000029, 8.0s 2023-06-17 12:50:13 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [37/50] train_loss: 0.6384, val_loss: 0.5324, val_mse: 0.3904, lr: 0.000027, 8.2s 2023-06-17 12:50:21 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [38/50] train_loss: 0.6329, val_loss: 0.5152, val_mse: 0.3793, lr: 0.000025, 8.1s 2023-06-17 12:50:29 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [39/50] train_loss: 0.6288, val_loss: 0.5184, val_mse: 0.3816, lr: 0.000023, 8.0s 2023-06-17 12:50:37 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [40/50] train_loss: 0.6354, val_loss: 0.5204, val_mse: 0.3816, lr: 0.000021, 8.1s 2023-06-17 12:50:45 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [41/50] train_loss: 0.6100, val_loss: 0.5263, val_mse: 0.3872, lr: 0.000019, 8.1s 2023-06-17 12:50:53 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [42/50] train_loss: 0.6270, val_loss: 0.5422, val_mse: 0.3978, lr: 0.000016, 8.2s 2023-06-17 12:51:01 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [43/50] train_loss: 0.5859, val_loss: 0.5251, val_mse: 0.3867, lr: 0.000014, 8.1s 2023-06-17 12:51:10 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [44/50] train_loss: 0.5847, val_loss: 0.5394, val_mse: 0.3978, lr: 0.000012, 8.7s 2023-06-17 12:51:10 | unimol/utils/metrics.py | 255 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 44 2023-06-17 12:51:11 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success! 2023-06-17 12:51:13 | unimol/models/nnmodel.py | 123 | INFO | Uni-Mol(QSAR) | fold 1, result {'mse': 0.3733483, 'mae': 0.43755728, 'spearmanr': 0.7287544581640742, 'rmse': 0.61102235, 'r2': 0.4841641368912225} 2023-06-17 12:51:13 | unimol/models/unimol.py | 107 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2023-06-17 12:51:23 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [1/50] train_loss: 1.0566, val_loss: 0.9536, val_mse: 0.7052, lr: 0.000067, 9.2s 2023-06-17 12:51:33 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [2/50] train_loss: 1.0477, val_loss: 0.8633, val_mse: 0.6460, lr: 0.000099, 9.1s 2023-06-17 12:51:43 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [3/50] train_loss: 1.0039, val_loss: 0.9287, val_mse: 0.6972, lr: 0.000097, 9.2s 2023-06-17 12:51:52 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [4/50] train_loss: 0.9543, val_loss: 0.7414, val_mse: 0.5560, lr: 0.000095, 8.8s 2023-06-17 12:52:01 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [5/50] train_loss: 0.9125, val_loss: 0.7524, val_mse: 0.5660, lr: 0.000093, 8.1s 2023-06-17 12:52:09 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [6/50] train_loss: 0.9342, val_loss: 0.6828, val_mse: 0.5135, lr: 0.000091, 8.1s 2023-06-17 12:52:18 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [7/50] train_loss: 0.8851, val_loss: 0.6962, val_mse: 0.5234, lr: 0.000089, 8.2s 2023-06-17 12:52:26 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [8/50] train_loss: 0.8771, val_loss: 0.6158, val_mse: 0.4621, lr: 0.000087, 8.1s 2023-06-17 12:52:35 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [9/50] train_loss: 0.8470, val_loss: 0.5859, val_mse: 0.4403, lr: 0.000085, 8.1s 2023-06-17 12:52:44 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [10/50] train_loss: 0.8372, val_loss: 0.6298, val_mse: 0.4755, lr: 0.000082, 8.4s 2023-06-17 12:52:52 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [11/50] train_loss: 0.8359, val_loss: 0.7237, val_mse: 0.5394, lr: 0.000080, 8.1s 2023-06-17 12:53:00 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [12/50] train_loss: 0.8217, val_loss: 0.5328, val_mse: 0.4006, lr: 0.000078, 8.0s 2023-06-17 12:53:09 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [13/50] train_loss: 0.8404, val_loss: 0.7307, val_mse: 0.5494, lr: 0.000076, 7.9s 2023-06-17 12:53:17 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [14/50] train_loss: 0.8167, val_loss: 0.5784, val_mse: 0.4335, lr: 0.000074, 8.2s 2023-06-17 12:53:25 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [15/50] train_loss: 0.7869, val_loss: 0.5668, val_mse: 0.4273, lr: 0.000072, 8.1s 2023-06-17 12:53:33 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [16/50] train_loss: 0.7866, val_loss: 0.5809, val_mse: 0.4385, lr: 0.000070, 8.2s 2023-06-17 12:53:41 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [17/50] train_loss: 0.7720, val_loss: 0.5132, val_mse: 0.3853, lr: 0.000068, 8.2s 2023-06-17 12:53:50 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [18/50] train_loss: 0.7649, val_loss: 0.6460, val_mse: 0.4820, lr: 0.000066, 8.1s 2023-06-17 12:53:58 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [19/50] train_loss: 0.7703, val_loss: 0.5627, val_mse: 0.4162, lr: 0.000064, 8.1s 2023-06-17 12:54:06 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [20/50] train_loss: 0.7510, val_loss: 0.5498, val_mse: 0.4076, lr: 0.000062, 8.2s 2023-06-17 12:54:15 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [21/50] train_loss: 0.7405, val_loss: 0.6078, val_mse: 0.4568, lr: 0.000060, 8.2s 2023-06-17 12:54:23 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [22/50] train_loss: 0.7389, val_loss: 0.5206, val_mse: 0.3895, lr: 0.000058, 7.9s 2023-06-17 12:54:31 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [23/50] train_loss: 0.7120, val_loss: 0.5197, val_mse: 0.3883, lr: 0.000056, 8.1s 2023-06-17 12:54:40 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [24/50] train_loss: 0.7085, val_loss: 0.5299, val_mse: 0.3929, lr: 0.000054, 9.1s 2023-06-17 12:54:49 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [25/50] train_loss: 0.7204, val_loss: 0.4924, val_mse: 0.3634, lr: 0.000052, 9.0s 2023-06-17 12:54:58 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [26/50] train_loss: 0.7203, val_loss: 0.4618, val_mse: 0.3460, lr: 0.000049, 8.8s 2023-06-17 12:55:07 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [27/50] train_loss: 0.7021, val_loss: 0.5251, val_mse: 0.3886, lr: 0.000047, 8.3s 2023-06-17 12:55:15 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [28/50] train_loss: 0.7067, val_loss: 0.5120, val_mse: 0.3827, lr: 0.000045, 8.0s 2023-06-17 12:55:23 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [29/50] train_loss: 0.6974, val_loss: 0.5871, val_mse: 0.4380, lr: 0.000043, 8.0s 2023-06-17 12:55:32 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [30/50] train_loss: 0.7017, val_loss: 0.4469, val_mse: 0.3339, lr: 0.000041, 8.2s 2023-06-17 12:55:40 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [31/50] train_loss: 0.6828, val_loss: 0.5249, val_mse: 0.3889, lr: 0.000039, 7.9s 2023-06-17 12:55:48 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [32/50] train_loss: 0.6813, val_loss: 0.5739, val_mse: 0.4252, lr: 0.000037, 8.2s 2023-06-17 12:55:57 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [33/50] train_loss: 0.6845, val_loss: 0.4893, val_mse: 0.3636, lr: 0.000035, 8.5s 2023-06-17 12:56:05 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [34/50] train_loss: 0.6813, val_loss: 0.6106, val_mse: 0.4473, lr: 0.000033, 8.2s 2023-06-17 12:56:13 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [35/50] train_loss: 0.6657, val_loss: 0.5357, val_mse: 0.4016, lr: 0.000031, 8.4s 2023-06-17 12:56:22 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [36/50] train_loss: 0.6654, val_loss: 0.5157, val_mse: 0.3824, lr: 0.000029, 8.1s 2023-06-17 12:56:30 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [37/50] train_loss: 0.6520, val_loss: 0.5144, val_mse: 0.3813, lr: 0.000027, 8.0s 2023-06-17 12:56:38 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [38/50] train_loss: 0.6396, val_loss: 0.5489, val_mse: 0.4070, lr: 0.000025, 8.2s 2023-06-17 12:56:46 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [39/50] train_loss: 0.6530, val_loss: 0.5142, val_mse: 0.3851, lr: 0.000023, 8.0s 2023-06-17 12:56:54 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [40/50] train_loss: 0.6644, val_loss: 0.5770, val_mse: 0.4274, lr: 0.000021, 8.1s 2023-06-17 12:56:54 | unimol/utils/metrics.py | 255 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 40 2023-06-17 12:56:56 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success! 2023-06-17 12:56:57 | unimol/models/nnmodel.py | 123 | INFO | Uni-Mol(QSAR) | fold 2, result {'mse': 0.33389777, 'mae': 0.42934737, 'spearmanr': 0.7043604826771424, 'rmse': 0.5778389, 'r2': 0.5037052463708651} 2023-06-17 12:56:58 | unimol/models/unimol.py | 107 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2023-06-17 12:57:06 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [1/50] train_loss: 0.9963, val_loss: 1.1187, val_mse: 0.8181, lr: 0.000067, 8.1s 2023-06-17 12:57:16 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [2/50] train_loss: 0.9809, val_loss: 1.0758, val_mse: 0.7894, lr: 0.000099, 8.9s 2023-06-17 12:57:25 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [3/50] train_loss: 0.9744, val_loss: 0.9197, val_mse: 0.6816, lr: 0.000097, 8.3s 2023-06-17 12:57:34 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [4/50] train_loss: 0.8896, val_loss: 0.8931, val_mse: 0.6608, lr: 0.000095, 8.1s 2023-06-17 12:57:43 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [5/50] train_loss: 0.8730, val_loss: 0.7967, val_mse: 0.5925, lr: 0.000093, 8.1s 2023-06-17 12:57:52 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [6/50] train_loss: 0.8742, val_loss: 0.8469, val_mse: 0.6300, lr: 0.000091, 8.1s 2023-06-17 12:58:00 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [7/50] train_loss: 0.8481, val_loss: 0.7456, val_mse: 0.5564, lr: 0.000089, 8.1s 2023-06-17 12:58:09 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [8/50] train_loss: 0.8167, val_loss: 0.7243, val_mse: 0.5415, lr: 0.000087, 8.1s 2023-06-17 12:58:18 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [9/50] train_loss: 0.7877, val_loss: 0.7422, val_mse: 0.5536, lr: 0.000085, 8.1s 2023-06-17 12:58:26 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [10/50] train_loss: 0.7905, val_loss: 0.6694, val_mse: 0.5022, lr: 0.000082, 8.0s 2023-06-17 12:58:35 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [11/50] train_loss: 0.7806, val_loss: 0.6803, val_mse: 0.5080, lr: 0.000080, 8.2s 2023-06-17 12:58:43 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [12/50] train_loss: 0.7693, val_loss: 0.7758, val_mse: 0.5808, lr: 0.000078, 8.2s 2023-06-17 12:58:51 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [13/50] train_loss: 0.8012, val_loss: 0.6340, val_mse: 0.4780, lr: 0.000076, 8.1s 2023-06-17 12:59:00 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [14/50] train_loss: 0.7451, val_loss: 0.7638, val_mse: 0.5757, lr: 0.000074, 8.0s 2023-06-17 12:59:08 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [15/50] train_loss: 0.7363, val_loss: 0.6183, val_mse: 0.4703, lr: 0.000072, 8.3s 2023-06-17 12:59:17 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [16/50] train_loss: 0.7324, val_loss: 0.6532, val_mse: 0.4934, lr: 0.000070, 8.2s 2023-06-17 12:59:26 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [17/50] train_loss: 0.7169, val_loss: 0.6582, val_mse: 0.4971, lr: 0.000068, 8.6s 2023-06-17 12:59:35 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [18/50] train_loss: 0.6963, val_loss: 0.6372, val_mse: 0.4803, lr: 0.000066, 9.2s 2023-06-17 12:59:44 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [19/50] train_loss: 0.7022, val_loss: 0.6088, val_mse: 0.4620, lr: 0.000064, 9.0s 2023-06-17 12:59:53 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [20/50] train_loss: 0.6787, val_loss: 0.5849, val_mse: 0.4404, lr: 0.000062, 8.4s 2023-06-17 13:00:02 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [21/50] train_loss: 0.7170, val_loss: 0.6280, val_mse: 0.4728, lr: 0.000060, 8.1s 2023-06-17 13:00:10 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [22/50] train_loss: 0.6678, val_loss: 0.6732, val_mse: 0.5074, lr: 0.000058, 8.1s 2023-06-17 13:00:18 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [23/50] train_loss: 0.6440, val_loss: 0.5854, val_mse: 0.4393, lr: 0.000056, 7.9s 2023-06-17 13:00:26 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [24/50] train_loss: 0.6608, val_loss: 0.6185, val_mse: 0.4654, lr: 0.000054, 8.1s 2023-06-17 13:00:34 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [25/50] train_loss: 0.6663, val_loss: 0.6564, val_mse: 0.4948, lr: 0.000052, 8.0s 2023-06-17 13:00:42 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [26/50] train_loss: 0.6536, val_loss: 0.5631, val_mse: 0.4249, lr: 0.000049, 8.0s 2023-06-17 13:00:53 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [27/50] train_loss: 0.6346, val_loss: 0.5800, val_mse: 0.4309, lr: 0.000047, 9.1s 2023-06-17 13:01:02 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [28/50] train_loss: 0.6523, val_loss: 0.5852, val_mse: 0.4444, lr: 0.000045, 8.9s 2023-06-17 13:01:10 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [29/50] train_loss: 0.6302, val_loss: 0.5751, val_mse: 0.4311, lr: 0.000043, 8.2s 2023-06-17 13:01:18 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [30/50] train_loss: 0.6229, val_loss: 0.5828, val_mse: 0.4391, lr: 0.000041, 8.0s 2023-06-17 13:01:26 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [31/50] train_loss: 0.6291, val_loss: 0.5778, val_mse: 0.4351, lr: 0.000039, 8.1s 2023-06-17 13:01:34 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [32/50] train_loss: 0.5913, val_loss: 0.5777, val_mse: 0.4298, lr: 0.000037, 8.1s 2023-06-17 13:01:42 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [33/50] train_loss: 0.6336, val_loss: 0.5617, val_mse: 0.4222, lr: 0.000035, 8.2s 2023-06-17 13:01:51 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [34/50] train_loss: 0.6080, val_loss: 0.5359, val_mse: 0.4038, lr: 0.000033, 8.2s 2023-06-17 13:02:00 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [35/50] train_loss: 0.6015, val_loss: 0.5468, val_mse: 0.4091, lr: 0.000031, 8.0s 2023-06-17 13:02:08 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [36/50] train_loss: 0.5942, val_loss: 0.5709, val_mse: 0.4266, lr: 0.000029, 8.1s 2023-06-17 13:02:16 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [37/50] train_loss: 0.5994, val_loss: 0.5571, val_mse: 0.4149, lr: 0.000027, 8.1s 2023-06-17 13:02:24 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [38/50] train_loss: 0.6010, val_loss: 0.5653, val_mse: 0.4218, lr: 0.000025, 8.1s 2023-06-17 13:02:32 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [39/50] train_loss: 0.5760, val_loss: 0.5826, val_mse: 0.4352, lr: 0.000023, 8.0s 2023-06-17 13:02:40 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [40/50] train_loss: 0.6080, val_loss: 0.5744, val_mse: 0.4271, lr: 0.000021, 8.2s 2023-06-17 13:02:49 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [41/50] train_loss: 0.5717, val_loss: 0.5595, val_mse: 0.4186, lr: 0.000019, 8.1s 2023-06-17 13:02:57 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [42/50] train_loss: 0.5596, val_loss: 0.5536, val_mse: 0.4140, lr: 0.000016, 8.0s 2023-06-17 13:03:05 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [43/50] train_loss: 0.5461, val_loss: 0.5577, val_mse: 0.4174, lr: 0.000014, 8.1s 2023-06-17 13:03:13 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [44/50] train_loss: 0.5775, val_loss: 0.5613, val_mse: 0.4205, lr: 0.000012, 8.1s 2023-06-17 13:03:13 | unimol/utils/metrics.py | 255 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 44 2023-06-17 13:03:14 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success! 2023-06-17 13:03:15 | unimol/models/nnmodel.py | 123 | INFO | Uni-Mol(QSAR) | fold 3, result {'mse': 0.40383738, 'mae': 0.45316046, 'spearmanr': 0.7082075781363502, 'rmse': 0.635482, 'r2': 0.4945312503291812} 2023-06-17 13:03:16 | unimol/models/unimol.py | 107 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2023-06-17 13:03:24 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [1/50] train_loss: 1.0177, val_loss: 0.9977, val_mse: 0.7278, lr: 0.000067, 8.1s 2023-06-17 13:03:33 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [2/50] train_loss: 1.0013, val_loss: 1.1132, val_mse: 0.8146, lr: 0.000099, 8.0s 2023-06-17 13:03:42 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [3/50] train_loss: 0.9368, val_loss: 0.8208, val_mse: 0.5932, lr: 0.000097, 8.8s 2023-06-17 13:03:51 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [4/50] train_loss: 0.8955, val_loss: 0.8036, val_mse: 0.5778, lr: 0.000095, 8.8s 2023-06-17 13:04:01 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [5/50] train_loss: 0.8553, val_loss: 0.7418, val_mse: 0.5351, lr: 0.000093, 8.9s 2023-06-17 13:04:10 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [6/50] train_loss: 0.8764, val_loss: 0.9070, val_mse: 0.6461, lr: 0.000091, 8.1s 2023-06-17 13:04:18 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [7/50] train_loss: 0.8546, val_loss: 0.6987, val_mse: 0.5121, lr: 0.000089, 8.0s 2023-06-17 13:04:28 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [8/50] train_loss: 0.8261, val_loss: 0.7095, val_mse: 0.5165, lr: 0.000087, 9.0s 2023-06-17 13:04:37 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [9/50] train_loss: 0.8314, val_loss: 0.6756, val_mse: 0.4913, lr: 0.000085, 9.0s 2023-06-17 13:04:46 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [10/50] train_loss: 0.8112, val_loss: 0.7011, val_mse: 0.5129, lr: 0.000082, 8.0s 2023-06-17 13:04:53 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [11/50] train_loss: 0.7951, val_loss: 0.6594, val_mse: 0.4772, lr: 0.000080, 7.8s 2023-06-17 13:05:02 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [12/50] train_loss: 0.8264, val_loss: 0.6824, val_mse: 0.5008, lr: 0.000078, 8.0s 2023-06-17 13:05:10 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [13/50] train_loss: 0.7938, val_loss: 0.6051, val_mse: 0.4428, lr: 0.000076, 8.0s 2023-06-17 13:05:19 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [14/50] train_loss: 0.7698, val_loss: 0.5837, val_mse: 0.4272, lr: 0.000074, 8.0s 2023-06-17 13:05:28 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [15/50] train_loss: 0.7579, val_loss: 0.6009, val_mse: 0.4390, lr: 0.000072, 8.0s 2023-06-17 13:05:36 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [16/50] train_loss: 0.7400, val_loss: 0.6783, val_mse: 0.4961, lr: 0.000070, 8.0s 2023-06-17 13:05:44 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [17/50] train_loss: 0.7337, val_loss: 0.6855, val_mse: 0.4961, lr: 0.000068, 7.9s 2023-06-17 13:05:52 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [18/50] train_loss: 0.7444, val_loss: 0.6196, val_mse: 0.4444, lr: 0.000066, 8.3s 2023-06-17 13:06:00 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [19/50] train_loss: 0.7323, val_loss: 0.5721, val_mse: 0.4148, lr: 0.000064, 8.2s 2023-06-17 13:06:09 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [20/50] train_loss: 0.7124, val_loss: 0.5886, val_mse: 0.4301, lr: 0.000062, 8.0s 2023-06-17 13:06:18 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [21/50] train_loss: 0.7051, val_loss: 0.5489, val_mse: 0.3975, lr: 0.000060, 8.4s 2023-06-17 13:06:27 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [22/50] train_loss: 0.7238, val_loss: 0.5354, val_mse: 0.3884, lr: 0.000058, 8.7s 2023-06-17 13:06:36 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [23/50] train_loss: 0.7240, val_loss: 0.6725, val_mse: 0.4873, lr: 0.000056, 8.1s 2023-06-17 13:06:44 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [24/50] train_loss: 0.6937, val_loss: 0.5550, val_mse: 0.4090, lr: 0.000054, 7.9s 2023-06-17 13:06:52 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [25/50] train_loss: 0.6826, val_loss: 0.6393, val_mse: 0.4632, lr: 0.000052, 8.0s 2023-06-17 13:07:00 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [26/50] train_loss: 0.6814, val_loss: 0.5455, val_mse: 0.3975, lr: 0.000049, 7.9s 2023-06-17 13:07:08 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [27/50] train_loss: 0.6589, val_loss: 0.5434, val_mse: 0.3960, lr: 0.000047, 8.0s 2023-06-17 13:07:16 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [28/50] train_loss: 0.6763, val_loss: 0.5872, val_mse: 0.4352, lr: 0.000045, 8.0s 2023-06-17 13:07:23 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [29/50] train_loss: 0.6659, val_loss: 0.5268, val_mse: 0.3853, lr: 0.000043, 7.8s 2023-06-17 13:07:32 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [30/50] train_loss: 0.6684, val_loss: 0.5575, val_mse: 0.4024, lr: 0.000041, 8.0s 2023-06-17 13:07:40 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [31/50] train_loss: 0.6467, val_loss: 0.5300, val_mse: 0.3829, lr: 0.000039, 8.0s 2023-06-17 13:07:50 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [32/50] train_loss: 0.6467, val_loss: 0.5846, val_mse: 0.4279, lr: 0.000037, 8.9s 2023-06-17 13:07:57 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [33/50] train_loss: 0.6270, val_loss: 0.5577, val_mse: 0.4093, lr: 0.000035, 7.9s 2023-06-17 13:08:05 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [34/50] train_loss: 0.6427, val_loss: 0.5868, val_mse: 0.4248, lr: 0.000033, 8.0s 2023-06-17 13:08:13 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [35/50] train_loss: 0.6501, val_loss: 0.5432, val_mse: 0.3935, lr: 0.000031, 7.9s 2023-06-17 13:08:21 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [36/50] train_loss: 0.6305, val_loss: 0.5176, val_mse: 0.3793, lr: 0.000029, 8.0s 2023-06-17 13:08:30 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [37/50] train_loss: 0.6297, val_loss: 0.5156, val_mse: 0.3713, lr: 0.000027, 7.9s 2023-06-17 13:08:39 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [38/50] train_loss: 0.6432, val_loss: 0.5531, val_mse: 0.4048, lr: 0.000025, 8.0s 2023-06-17 13:08:47 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [39/50] train_loss: 0.6161, val_loss: 0.5432, val_mse: 0.3905, lr: 0.000023, 7.9s 2023-06-17 13:08:55 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [40/50] train_loss: 0.6009, val_loss: 0.5115, val_mse: 0.3746, lr: 0.000021, 7.9s 2023-06-17 13:09:03 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [41/50] train_loss: 0.6052, val_loss: 0.5091, val_mse: 0.3698, lr: 0.000019, 7.9s 2023-06-17 13:09:11 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [42/50] train_loss: 0.6008, val_loss: 0.5425, val_mse: 0.3979, lr: 0.000016, 8.0s 2023-06-17 13:09:19 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [43/50] train_loss: 0.5973, val_loss: 0.5384, val_mse: 0.3888, lr: 0.000014, 8.0s 2023-06-17 13:09:27 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [44/50] train_loss: 0.6030, val_loss: 0.5716, val_mse: 0.4130, lr: 0.000012, 7.9s 2023-06-17 13:09:35 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [45/50] train_loss: 0.5957, val_loss: 0.5366, val_mse: 0.3926, lr: 0.000010, 8.0s 2023-06-17 13:09:43 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [46/50] train_loss: 0.6118, val_loss: 0.5083, val_mse: 0.3681, lr: 0.000008, 8.0s 2023-06-17 13:09:53 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [47/50] train_loss: 0.5846, val_loss: 0.5475, val_mse: 0.3945, lr: 0.000006, 8.9s 2023-06-17 13:10:01 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [48/50] train_loss: 0.5963, val_loss: 0.5365, val_mse: 0.3876, lr: 0.000004, 8.3s 2023-06-17 13:10:09 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [49/50] train_loss: 0.5992, val_loss: 0.5280, val_mse: 0.3823, lr: 0.000002, 8.0s 2023-06-17 13:10:18 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [50/50] train_loss: 0.5823, val_loss: 0.5252, val_mse: 0.3796, lr: 0.000000, 8.9s 2023-06-17 13:10:20 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success! 2023-06-17 13:10:21 | unimol/models/nnmodel.py | 123 | INFO | Uni-Mol(QSAR) | fold 4, result {'mse': 0.3681449, 'mae': 0.4405691, 'spearmanr': 0.7418240886637215, 'rmse': 0.6067495, 'r2': 0.5089094245543386} 2023-06-17 13:10:21 | unimol/models/nnmodel.py | 135 | INFO | Uni-Mol(QSAR) | Uni-Mol metrics score: {'mse': 0.3771468026353712, 'mae': 0.4450550450234257, 'spearmanr': 0.7078743179462539, 'rmse': 0.6141227911707652, 'r2': 0.4866199624063017} 2023-06-17 13:10:21 | unimol/models/nnmodel.py | 136 | INFO | Uni-Mol(QSAR) | Uni-Mol & Metric result saved!
2023-06-17 13:21:14 | unimol/data/conformer.py | 56 | INFO | Uni-Mol(QSAR) | Start generating conformers... 7363it [02:58, 41.14it/s] 2023-06-17 13:24:13 | unimol/data/conformer.py | 60 | INFO | Uni-Mol(QSAR) | Failed to generate conformers for 0.01% of molecules. 2023-06-17 13:24:14 | unimol/data/conformer.py | 62 | INFO | Uni-Mol(QSAR) | Failed to generate 3d conformers for 0.07% of molecules. 2023-06-17 13:24:14 | unimol/models/unimol.py | 107 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2023-06-17 13:24:14 | unimol/models/nnmodel.py | 145 | INFO | Uni-Mol(QSAR) | start predict NNModel:unimolv1 2023-06-17 13:24:15 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success! 2023-06-17 13:24:21 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success! 2023-06-17 13:24:27 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success! 2023-06-17 13:24:33 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success! 2023-06-17 13:24:40 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success! 2023-06-17 13:24:45 | unimol/predict.py | 66 | INFO | Uni-Mol(QSAR) | final predict metrics score: {'mse': 0.20172259087068667, 'mae': 0.2717721822179887, 'spearmanr': 0.9094161936023004, 'rmse': 0.4491353814505006, 'r2': 0.7876258838726424} 2023-06-17 13:24:46 | unimol/data/conformer.py | 56 | INFO | Uni-Mol(QSAR) | Start generating conformers... 1841it [00:40, 45.34it/s] 2023-06-17 13:25:27 | unimol/data/conformer.py | 60 | INFO | Uni-Mol(QSAR) | Failed to generate conformers for 0.00% of molecules. 2023-06-17 13:25:27 | unimol/data/conformer.py | 62 | INFO | Uni-Mol(QSAR) | Failed to generate 3d conformers for 0.05% of molecules. 2023-06-17 13:25:28 | unimol/models/unimol.py | 107 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2023-06-17 13:25:28 | unimol/models/nnmodel.py | 145 | INFO | Uni-Mol(QSAR) | start predict NNModel:unimolv1 2023-06-17 13:25:28 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success! 2023-06-17 13:25:30 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success! 2023-06-17 13:25:32 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success! 2023-06-17 13:25:34 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success! 2023-06-17 13:25:36 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success! 2023-06-17 13:25:38 | unimol/predict.py | 66 | INFO | Uni-Mol(QSAR) | final predict metrics score: {'mse': 0.4197742218716444, 'mae': 0.42912608007320174, 'spearmanr': 0.7708930974024512, 'rmse': 0.6478998548168108, 'r2': 0.5847316207841755} [Uni-Mol] MSE:0.4198
<Figure size 1500x1050 with 1 Axes>
Results Overview
Finally, we can conduct a horizontal comparison of the performance of 1D-QSAR, 2D-QSAR, and 3D-QSAR with different model combinations, as well as the predictive performance of Uni-Mol on the same dataset.
MSE | error | |
---|---|---|
Uni-Mol | 0.419774 | [2.522239303588867, 2.0335350990295407, 2.1235... |
2D-QSAR-Support Vector | 0.455441 | [1.6594621254469004, 1.801769913338167, 1.3386... |
2D-QSAR-XGBoost | 0.459129 | [1.523523902893066, 1.5693136215209957, 0.7394... |
2D-QSAR-Random Forest | 0.47166 | [1.9880250000000013, 2.382200000000001, 0.8454... |
2D-QSAR-LightGBM | 0.479684 | [2.022284730700359, 2.591602960937026, 0.79469... |
2D-QSAR-K-Nearest Neighbors | 0.480645 | [1.5099999999999998, 1.5079999999999973, 0.975... |
1D-QSAR-Random Forest | 0.605183 | [2.3907239177489146, 2.4765941666666667, 3.016... |
1D-QSAR-XGBoost | 0.605652 | [2.509926891326904, 3.200466728210449, 2.42616... |
1D-QSAR-LightGBM | 0.642647 | [2.346929558613308, 2.5087293396443835, 2.5538... |
2D-QSAR-Gradient Boosting | 0.669449 | [2.918999383876205, 2.653413649160223, 2.55135... |
2D-QSAR-Multi-layer Perceptron | 0.693308 | [1.1202709345431376, 1.3843046457283936, 1.224... |
2D-QSAR-Ridge Regression | 0.715356 | [2.78798850792775, 2.1465278084733654, 2.51336... |
2D-QSAR-Linear Regression | 0.715559 | [2.785544528615863, 2.1429891031766406, 2.5107... |
3D-QSAR-LightGBM | 0.730661 | [3.8540520524439525, 1.364569493019939, 1.6295... |
1D-QSAR-Gradient Boosting | 0.760707 | [4.202714018353101, 4.082667464743396, 3.79711... |
3D-QSAR-Random Forest | 0.783114 | [3.5546999999999995, 2.4127666666666663, 2.851... |
3D-QSAR-XGBoost | 0.810273 | [3.825884914398193, 0.8879476547241207, 1.3634... |
3D-QSAR-Gradient Boosting | 0.866329 | [4.33815854578517, 2.8987060129646505, 2.78310... |
1D-QSAR-Ridge Regression | 0.885736 | [4.533467314501845, 4.120692997179958, 4.01560... |
1D-QSAR-Linear Regression | 0.885739 | [4.533419974565081, 4.120610897612493, 4.01553... |
2D-QSAR-Decision Tree | 0.889239 | [0.5700000000000003, 0.17999999999999972, 1.54... |
1D-QSAR-K-Nearest Neighbors | 0.911012 | [2.523999999999999, 3.3120000000000003, 4.3239... |
1D-QSAR-ElasticNet Regression | 0.926934 | [4.59931604048185, 4.317325121519717, 4.137556... |
1D-QSAR-Lasso Regression | 0.928588 | [4.596617256928905, 4.317406647064892, 4.13327... |
1D-QSAR-Multi-layer Perceptron | 0.938524 | [4.539483485335261, 4.240490901203048, 4.04913... |
1D-QSAR-Support Vector | 0.939777 | [4.7594409976508505, 4.452918330671407, 4.2889... |
2D-QSAR-ElasticNet Regression | 1.010851 | [4.570272986554394, 4.320272986554394, 4.09027... |
2D-QSAR-Lasso Regression | 1.010851 | [4.570272986554394, 4.320272986554394, 4.09027... |
3D-QSAR-Support Vector | 1.042737 | [4.750099446088549, 4.500099446088741, 4.27009... |
1D-QSAR-Decision Tree | 1.057852 | [2.523999999999999, 2.8999999999999995, 2.3549... |
3D-QSAR-K-Nearest Neighbors | 1.194292 | [4.92, 4.12, 5.353999999999999, 4.231999999999... |
3D-QSAR-Decision Tree | 1.598439 | [3.0299999999999994, 1.1399999999999988, 1.659... |
3D-QSAR-Lasso Regression | 805.758003 | [4.569499870078355, 4.319499281941863, 4.08949... |
3D-QSAR-ElasticNet Regression | 2390.261763 | [4.56929324838699, 4.319292354357786, 4.089292... |
3D-QSAR-Multi-layer Perceptron | 3482168556886455484416.0 | [3804401.3051103745, 3804401.5506377984, 38044... |
3D-QSAR-Ridge Regression | 4392658479297741235159040.0 | [4.323969041648061, 3.9322019482891504, 3.6812... |
3D-QSAR-Linear Regression | 34953863171341550859845632.0 | [4.28654773084929, 3.4351554464664877, 3.17109... |
<Figure size 1500x1050 with 1 Axes>
One More Thing
Top 5 models: 2D-QSAR-Support Vector: MSE=0.4554 2D-QSAR-XGBoost: MSE=0.4591 2D-QSAR-Random Forest: MSE=0.4717 2D-QSAR-LightGBM: MSE=0.4797 2D-QSAR-K-Nearest Neighbors: MSE=0.4806 1D-QSAR-Random Forest: MSE=0.6052 1D-QSAR-XGBoost: MSE=0.6057 1D-QSAR-LightGBM: MSE=0.6426 2D-QSAR-Gradient Boosting: MSE=0.6694 2D-QSAR-Multi-layer Perceptron: MSE=0.6933
[META][Linear Regression] MSE:0.4761 [META][Ridge Regression] MSE:0.4750 [META][Lasso Regression] MSE:1.0109 [META][ElasticNet Regression] MSE:0.6884 [META][Support Vector] MSE:0.4874 [META][K-Nearest Neighbors] MSE:0.4511 [META][Decision Tree] MSE:0.4866 [META][Random Forest] MSE:0.4824 [META][Gradient Boosting] MSE:0.4765 [META][XGBoost] MSE:0.4829 [META][LightGBM] MSE:0.4774 [META][Multi-layer Perceptron] MSE:0.4604
[Top 5 Meta Model] MSE:0.4656 [Ensemble][K-Nearest Neighbors vs. Top5_Meta] Performance Gain (MSE): -0.0145 [Ensemble][Multi-layer Perceptron vs. Top5_Meta] Performance Gain (MSE): -0.0052 [Ensemble][Ridge Regression vs. Top5_Meta] Performance Gain (MSE): 0.0094 [Ensemble][Linear Regression vs. Top5_Meta] Performance Gain (MSE): 0.0105 [Ensemble][Gradient Boosting vs. Top5_Meta] Performance Gain (MSE): 0.0110