定量构效关系(QSAR)模型从0到1 & Uni-Mol入门实践(回归任务)
©️ Copyright 2023 @ Authors
作者:
郑行 📨 ,
陈乐天 📨
日期:2023-06-16
共享协议:本作品采用知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议进行许可。
快速开始:点击上方的 开始连接 按钮,选择 unimol-qsar:v0.5镜像及任意GPU节点配置,稍等片刻即可运行。
近年来,人工智能(AI)正以前所未有的速度发展,为各个领域带来巨大的突破和变革。
而实际上,在药物研发领域,药物科学家从上世纪就开始运用一系列数学和统计方法来助力药物研发的流程。他们基于药物分子的结构,构建数学模拟,用以预测药物的生化活性,这种方法被称为定量构效关系(Quantitative Structure-Activity Relationship,QSAR)。QSAR模型也随着人们对药物分子研究的不断深入,以及更多的人工智能方法被提出而持续发展。
可以说,QSAR模型是一个很好的观察AI for Science领域发展的缩影。在这个Notebook中,我们将以案例的形式介绍不同类型的QSAR模型的构建方法。
引言
定量构效关系(Quantitative Structure-Activity Relationship,QSAR)是一种研究化合物的化学结构与生物活性之间定量关系的方法,是计算机辅助药物设计(Computer-Aided Drug Design, CADD)中最为重要的工具之一。QSAR旨在建立数学模型,构建分子结构与其生化、物化性质关系,帮助药物科学家对新的药物分子的性质开展合理预测。
构建一个有效的QSAR模型涉及到若干步骤:
- 构建合理的分子表征(Molecular Representation),将分子结构转化为计算机可读的数值表示;
- 选择适合分子表征的机器学习模型,并使用已有的分子-性质数据训练模型;
- 使用训练好的机器学习模型,对未测定性质的分子进行性质预测。
QSAR模型的发展也正是随着分子表征的演进,以及对应机器学习模型的升级而不断变化。 在这个Notebook中,我们将以案例的形式介绍不同类型的QSAR模型的构建方法。
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Collecting seaborn Downloading seaborn-0.13.2-py3-none-any.whl (294 kB) |████████████████████████████████| 294 kB 2.8 MB/s eta 0:00:01 Requirement already satisfied: matplotlib!=3.6.1,>=3.4 in /opt/conda/lib/python3.8/site-packages (from seaborn) (3.5.1) Requirement already satisfied: pandas>=1.2 in /opt/conda/lib/python3.8/site-packages (from seaborn) (1.3.5) Requirement already satisfied: numpy!=1.24.0,>=1.20 in /opt/conda/lib/python3.8/site-packages (from seaborn) (1.20.3) Requirement already satisfied: python-dateutil>=2.7 in /opt/conda/lib/python3.8/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (2.8.2) Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.8/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (1.4.2) Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.8/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (0.11.0) Requirement already satisfied: pyparsing>=2.2.1 in /opt/conda/lib/python3.8/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (3.0.8) Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.8/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (21.3) Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/lib/python3.8/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (4.32.0) Requirement already satisfied: pillow>=6.2.0 in /opt/conda/lib/python3.8/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (9.0.1) Requirement already satisfied: pytz>=2017.3 in /opt/conda/lib/python3.8/site-packages (from pandas>=1.2->seaborn) (2022.1) Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib!=3.6.1,>=3.4->seaborn) (1.16.0) Installing collected packages: seaborn Successfully installed seaborn-0.13.2 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
然后,我们可以观察一下这个数据集的组成:
ep_data.csv ep_data_train.csv melt_data_test.csv ep_data_test.csv melt_data.csv melt_data_train.csv
------------ Original data ------------ SMILEs Target 0 C1COC(=O)O1 33.42 1 C1(=O)O[C@H](CO1)C -40.58 2 CCOC(=O)OCC -42.05 3 CCOC(=O)OC -36.27 4 COC(=O)OC -6.01 .. ... ... 926 COCCOCCOCC(F)OC(F)(F)COC -5.45 927 COCCOCC(F)OC(F)(F)COCCOC -21.65 928 COCCOCCOCCOC(F)COC(F)F -17.70 929 COCCOCC(F)OCCOCC(F)OCF -22.26 930 COCCOCCOCC(F)OCC(F)OCF -13.70 [931 rows x 2 columns]
可以看到,在hERG数据集里:
- 分子用SMILES字符串表示;
- 任务目标是一个回归预测任务,预测分子对蛋白的抑制活性,用pIC50表示。
这是一个常见的分子性质预测任务。好的,先把这个数据集放一边。接下来,让我们正式进入探索
QSAR的简明历史
定量构效关系(Quantitative Structure-Activity Relationship,QSAR)是一种研究化合物的化学结构与生物活性之间定量关系的方法,是计算机辅助药物设计(Computer-Aided Drug Design, CADD)中最为重要的工具之一。QSAR旨在建立数学模型,构建分子结构与其生化、物化性质关系,帮助药物科学家对新的药物分子的性质开展合理预测。
QSAR是由构效关系(Structure-Activity Relationship,SAR)分析发展而来的。SAR的起源可以追溯到19世纪末,当时化学家们开始研究化合物的结构与生物活性之间的关系,德国化学家Paul Ehrlich(1854-1915),他提出了“锁-钥”假说,即化合物(钥匙)与生物靶标(锁)之间的相互作用取决于它们的空间匹配度。随着科学家对分子间相互作用的深入理解,大家发现除了空间匹配外,靶点表面性质(例如疏水性、亲电性)与配体对应结构的性质相互匹配也至关重要,于是发展了一系列评价结构特性与结合亲和力的分析方法,即构效关系。
然而,SAR方法主要依赖于化学家的经验和直观判断,缺乏严密的理论基础和统一的分析方法。为了克服这些局限性,20世纪60年代,科学家们开始尝试使用数学和统计方法对分子结构与生物活性之间的关系进行定量分析。
最早提出的QSAR模型可以追溯到1868年,化学家Alexander Crum Brown和生理学家Thomas R. Fraser开始研究化合物结构与生物活性之间的关系。在研究生物碱的碱性N原子甲基化前后的生物效应时,他们提出化合物的生理活性依赖于其组分的构成,即生物活性是化合物组成的函数:,这就是著名的Crum-Brown方程。这一假设为后来的QSAR研究奠定了基础。
随后,不断有QSAR模型在学界被提出,例如Hammett提出的有机物毒性与分子电性的QSAR模型、Taft提出的立体参数模型。1964年,Hansch和Fujita提出了著名的Hansch模型,指出分子的生物活性主要是由其疏水效应()、立体效应()和静电效应()决定的,并假设这三种效应彼此可以独立相加,其完整形式为:。Hansch模型首次将化学信息与药物生物活性之间的关系进行了定量化描述,为后续的QSAR研究提供了一个实用的理论框架,被认为是从盲目药物设计过渡到合理药物设计的重要标志。
时至今日,QSAR已经发展成为一个成熟的研究领域,涉及多种计算方法和技术。近年来,随着机器学习和人工智能技术的快速发展,QSAR方法得到了进一步的拓展和应用。例如,深度学习技术被应用于QSAR模型的构建,提高了模型的预测能力和准确性。此外,QSAR方法在环境科学、材料科学等领域也取得了广泛的应用,显示出其强大的潜力和广泛的应用前景。
QSAR建模的基本要求
2002年在葡萄牙的Setubal召开的一次国际会议上,与会的科学工作者们提出了关于QSAR模型有效性的几条规则,被称为“Setubal Principles”,这些规则在 2004年11月得到了进一步详细的修正,并被正式命名为“OECD Principles”。规定一个QSAR模型要达到调控目的(regulatory purpose),应该满足以下5个条件:
- a defined endpoint(明确目标)
- an unambiguous algorithm(明确算法)
- a defined domain of applicability(明确的使用范围)
- appropriate measures of goodness-of-fit, robustness and predictivity(稳定)
- a mechanistic interpretation, if possible(如果可能的话,可解释)
分子表示
分子表示是包含分子属性的数值表示。例如我们常见的分子描述符(Descriptor)、分子指纹(Fingerprints)、SMILES字符串、分子势函数等都是常见的分子表示方法。
Wei, J., Chu, X., Sun, X. Y., Xu, K., Deng, H. X., Chen, J., ... & Lei, M. (2019). Machine learning in materials science. InfoMat, 1(3), 338-358.
事实上,QSAR的发展也正是随着分子表示包含的信息不断增多、分子表示的形式不断变化而产生分类,常见的QSAR模型可以分为1D-QSAR、2D-QSAR、3D-QSAR:
不同的分子表示有不同的数值特点,因此也要选用不同的机器学习/深度学习模型进行建模。接下来,我们就将以实际案例给大家展示如何构建1D-QSAR, 2D-QSAR, 3D-QSAR模型。
1D-QSAR分子表征
早期的定量构效关系模型大多以分子量、水溶性、分子表面积等分子的物化性质作为表征的方法,我们称这些分子的物化性质为分子描述符(Descriptor)。这就是1D-QSAR的阶段。
这个阶段往往需要富有经验的科学家基于自己的领域知识,来进行分子描述符的设计,去构建一些可能和这个性质相关的一些分子性质。例如假设要预测某个药物是否能通过血脑屏障,那这个性质可能和药物分子的水溶性、分子量、极性表面积等物化属性相关,科学家就要把这样的属性加入到分子描述符中。
这个阶段由于计算机尚未普及,或算力不足,科学家往往通过一些简单的数学模型进行建模,例如线性回归、随机森林等方法。当然了,由于通过分子描述符构建的分子表示通常是低维的实值向量,这些数学模型也很适合做这样的工作。
[[188.14499999999998, 1.1842, 0, 3, 27.69, 7, 6, 74, 0.5613667869060323]]
[1D-QSAR][Linear Regression] MSE:246.5578 [1D-QSAR][Ridge Regression] MSE:245.0466 [1D-QSAR][Lasso Regression] MSE:243.1873 [1D-QSAR][ElasticNet Regression] MSE:245.4103 [1D-QSAR][Support Vector] MSE:343.4611 [1D-QSAR][K-Nearest Neighbors] MSE:201.7406 [1D-QSAR][Decision Tree] MSE:209.0326 [1D-QSAR][Random Forest] MSE:180.9326 [1D-QSAR][Gradient Boosting] MSE:203.7126 [1D-QSAR][XGBoost] MSE:211.7050
2D-QSAR分子表征
然而,面临一些生化机制尚不清晰的分子性质预测问题时,科学家可能很难设计出有效的分子描述符来表征分子,导致QSAR模型构建的失败。由于分子性质很大程度由分子结构决定,例如分子上有什么官能团,因此人们想把分子的键连关系引入到QSAR建模中。于是领域进入了2D-QSAR的阶段。
较早被提出的是Morgan指纹等通过遍历分子中每个原子与周边原子的键连关系来表征的分子指纹方法。为了满足不同大小的分子能用相同长度的数值向量来表征的要求,分子指纹往往会通过hash的操作来保证向量长度的统一,因此分子指纹往往是高维的0/1向量。在这个场景下,科学家通常会选择例如支持向量机,以及全连接神经网络等对高维稀疏向量有较好处理能力的机器学习方法来进行模型构建。
随着AI模型的发展,能处理序列数据(例如文本)的循环神经网络(Recurrent neural network, RNN)、能处理图片数据的卷积神经网络(convolutional neural network, CNN)、能处理非结构化的图数据的图神经网络(graph neural network, GNN)等深度学习模型不断被提出和应用,QSAR模型也根据这些模型能处理的数据特点,构建了适配的分子表示。例如人们将分子的SMILES字符表示应用RNN建模,将分子的二维图像应用CNN建模,将分子的键连拓扑结构转化成图应用GNN建模发展了一系列的QSAR建模方法。
但是总的来说,在2D-QSAR阶段中,人们在利用各类方法去解析分子的键连关系(拓扑结构)来进行分子性质的建模预测。
[array([0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])]
[2D-QSAR][Ridge Regression] MSE:146.4245 [2D-QSAR][Lasso Regression] MSE:323.7586 [2D-QSAR][ElasticNet Regression] MSE:357.5676 [2D-QSAR][Support Vector] MSE:331.7263 [2D-QSAR][K-Nearest Neighbors] MSE:187.5412 [2D-QSAR][Decision Tree] MSE:217.8705 [2D-QSAR][Random Forest] MSE:147.6240 [2D-QSAR][Gradient Boosting] MSE:164.3132 [2D-QSAR][XGBoost] MSE:134.4185
3D-QSAR分子表征
然而,由于分子间、分子内相互作用的存在,拓扑结构相近的分子在各个不同环境下会采取的构象不尽相同。而每个分子在不同环境下的构象以及对应的能量高低决定了分子的真实性质。因此,科学家期望将分子的三维结构引入到QSAR建模里去,来增强对特定场景的分子性质预测能力。这个阶段被称为3D-QSAR阶段。
分子比较场方法(CoFMA)是被广泛应用的3D-QSAR模型。它计算分子存在的空间中各个位置(通常通过格点法进行位置的选取)所受到的力的作用(也就是力场)来表征分子的三维结构。当然,领域中还有一些有益的尝试,包括通过电子密度、分子三维图像等表征方法,或是在分子图上加入几何信息。
而要处理这样的高维空间信息,科学家们往往会选择例如较深的FCNN、3D-CNN、GNN等深度学习方法来进行建模。
length: 10000
我们可以看到3D-QSAR会构建出非常长的分子表征来。所以我们先对这个分子表征通过PCA进行降维。
[3D-QSAR][Ridge Regression] MSE:287.1863 [3D-QSAR][Lasso Regression] MSE:425.9835 [3D-QSAR][ElasticNet Regression] MSE:425.9835 [3D-QSAR][Support Vector] MSE:287.4649 [3D-QSAR][K-Nearest Neighbors] MSE:261.1949 [3D-QSAR][Decision Tree] MSE:523.6909 [3D-QSAR][Random Forest] MSE:234.6376 [3D-QSAR][Gradient Boosting] MSE:196.9132 [3D-QSAR][XGBoost] MSE:273.1910
Uni-Mol 分子表示学习和预训练框架
预训练模型
在药物研发领域中,QSAR建模面临的一个主要挑战是数据量有限。由于药物活性数据的获取成本高且实验难度大,这导致了标签数据不足的情况。数据量不足会影响模型的预测能力,因为模型可能难以捕捉到足够的信息来描述化合物结构和生物活性之间的关系。
面临这种有标签数据不足的情况,在机器学习发展地更为成熟的领域,例如自然语言处理(NLP)和计算机视觉(CV)中,预训练-微调(Pretrain-Finetune)模式已经成为了通用的解决方案。预训练是指在大量无标签数据对模型通过自监督学习进行预先训练,使模型获得一些基本信息和通用能力,然后再在有限的有标签数据上进行监督学习来微调模型,使模型在具体问题上具备特定问题的推理能力。
例如,我想进行猫狗的图片识别,但是我没有很多猫狗的有标签数据。于是我可以先用大量的没有标签的图片预训练模型,先让模型学到点线面轮廓的基本知识,然后再把猫狗图片给模型做有监督训练,这时候,模型可能就能基于轮廓信息,快速学习到什么是猫什么是狗的信息了。
预训练方法可以充分利用大量容易获取的无标签数据的信息,提高模型的泛化能力和预测性能。在QSAR建模中,我们同样可以借鉴预训练的思想来解决数据数量和数据质量问题。
Uni-Mol 简介
Uni-Mol是深势科技于2022年5月发布的一款基于分子三维结构的通用分子表征学习框架。Uni-Mol将分子三维结构作为模型输入,并使用约2亿个小分子构象和300万个蛋白表面空腔结构,使用原子类型还原和原子坐标还原两种自监督任务对模型进行预训练。
Uni-Mol 论文:https://openreview.net/forum?id=6K2RM6wVqKu
开源代码:https://github.com/dptech-corp/Uni-Mol
从三维信息出发的表征学习和有效的预训练方案让 Uni-Mol 在几乎所有与药物分子和蛋白口袋相关的下游任务上都超越了 SOTA(state of the art),也让 Uni-Mol 得以能够直接完成分子构象生成、蛋白-配体结合构象预测等三维构象生成相关的任务,并超越现有解决方案。论文被机器学习顶会ICLR 2023接收。
接下来,我们要使用Uni-Mol来完成BACE-1分子活性预测任务的构建:
/opt/conda/lib/python3.8/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm 2024-08-27 00:38:56 | unimol/data/datareader.py | 147 | INFO | Uni-Mol(QSAR) | Anomaly clean with 3 sigma threshold: 744 -> 739 2024-08-27 00:38:56 | unimol/data/conformer.py | 62 | INFO | Uni-Mol(QSAR) | Start generating conformers... 739it [00:02, 284.33it/s] 2024-08-27 00:38:59 | unimol/data/conformer.py | 66 | INFO | Uni-Mol(QSAR) | Failed to generate conformers for 0.00% of molecules. 2024-08-27 00:38:59 | unimol/data/conformer.py | 68 | INFO | Uni-Mol(QSAR) | Failed to generate 3d conformers for 0.00% of molecules. 2024-08-27 00:38:59 | unimol/train.py | 105 | INFO | Uni-Mol(QSAR) | Output directory already exists: ./exp_reg_mp_0827 2024-08-27 00:38:59 | unimol/train.py | 106 | INFO | Uni-Mol(QSAR) | Warning: Overwrite output directory: ./exp_reg_mp_0827 2024-08-27 00:38:59 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2024-08-27 00:39:01 | unimol/models/nnmodel.py | 103 | INFO | Uni-Mol(QSAR) | start training Uni-Mol:unimolv1 2024-08-27 00:39:11 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/50] train_loss: 0.9846, val_loss: 0.6175, val_mse: 241.2657, lr: 0.000067, 9.8s 2024-08-27 00:39:15 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/50] train_loss: 0.5829, val_loss: 0.3109, val_mse: 121.9999, lr: 0.000099, 3.7s 2024-08-27 00:39:19 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/50] train_loss: 0.3750, val_loss: 0.2398, val_mse: 95.1360, lr: 0.000097, 3.7s 2024-08-27 00:39:23 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/50] train_loss: 0.3160, val_loss: 0.1996, val_mse: 78.8414, lr: 0.000095, 3.7s 2024-08-27 00:39:27 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/50] train_loss: 0.2417, val_loss: 0.2351, val_mse: 91.4451, lr: 0.000093, 3.7s 2024-08-27 00:39:30 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/50] train_loss: 0.2511, val_loss: 0.2003, val_mse: 73.3333, lr: 0.000091, 3.7s 2024-08-27 00:39:35 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/50] train_loss: 0.1944, val_loss: 0.3975, val_mse: 140.8205, lr: 0.000089, 3.7s 2024-08-27 00:39:38 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/50] train_loss: 0.2119, val_loss: 0.2135, val_mse: 82.6432, lr: 0.000087, 3.7s 2024-08-27 00:39:42 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/50] train_loss: 0.1881, val_loss: 0.2757, val_mse: 111.2105, lr: 0.000085, 3.7s 2024-08-27 00:39:46 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/50] train_loss: 0.1648, val_loss: 0.2425, val_mse: 89.3108, lr: 0.000082, 3.7s 2024-08-27 00:39:49 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [11/50] train_loss: 0.1378, val_loss: 0.2225, val_mse: 84.5067, lr: 0.000080, 3.7s 2024-08-27 00:39:53 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [12/50] train_loss: 0.1331, val_loss: 0.2360, val_mse: 95.7098, lr: 0.000078, 3.6s 2024-08-27 00:39:56 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [13/50] train_loss: 0.1304, val_loss: 0.3110, val_mse: 114.3849, lr: 0.000076, 3.6s 2024-08-27 00:40:00 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [14/50] train_loss: 0.1270, val_loss: 0.2008, val_mse: 77.3083, lr: 0.000074, 3.6s 2024-08-27 00:40:04 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [15/50] train_loss: 0.0971, val_loss: 0.2021, val_mse: 75.9647, lr: 0.000072, 3.6s 2024-08-27 00:40:07 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [16/50] train_loss: 0.0848, val_loss: 0.2424, val_mse: 90.1548, lr: 0.000070, 3.6s 2024-08-27 00:40:07 | unimol/utils/metrics.py | 228 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 16 2024-08-27 00:40:07 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-08-27 00:40:07 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 0, result {'mse': 73.333275, 'mae': 6.7058377, 'pearsonr': 0.902598762239846, 'spearmanr': 0.883926381731113, 'r2': 0.7962659880458877} 2024-08-27 00:40:08 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2024-08-27 00:40:12 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/50] train_loss: 0.9136, val_loss: 0.8346, val_mse: 321.7541, lr: 0.000067, 3.6s 2024-08-27 00:40:16 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/50] train_loss: 0.6272, val_loss: 0.6517, val_mse: 223.5876, lr: 0.000099, 3.6s 2024-08-27 00:40:20 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/50] train_loss: 0.4234, val_loss: 0.2953, val_mse: 94.0802, lr: 0.000097, 3.7s 2024-08-27 00:40:24 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/50] train_loss: 0.2984, val_loss: 0.2926, val_mse: 90.2678, lr: 0.000095, 3.7s 2024-08-27 00:40:28 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/50] train_loss: 0.2441, val_loss: 0.2862, val_mse: 85.1286, lr: 0.000093, 3.7s 2024-08-27 00:40:32 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/50] train_loss: 0.2494, val_loss: 0.3414, val_mse: 102.8736, lr: 0.000091, 3.7s 2024-08-27 00:40:36 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/50] train_loss: 0.1839, val_loss: 0.3191, val_mse: 93.4298, lr: 0.000089, 3.7s 2024-08-27 00:40:40 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/50] train_loss: 0.1630, val_loss: 0.3195, val_mse: 95.3735, lr: 0.000087, 3.7s 2024-08-27 00:40:43 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/50] train_loss: 0.1762, val_loss: 0.2795, val_mse: 72.9747, lr: 0.000085, 3.7s 2024-08-27 00:40:47 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/50] train_loss: 0.1724, val_loss: 0.3913, val_mse: 143.1492, lr: 0.000082, 3.7s 2024-08-27 00:40:51 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [11/50] train_loss: 0.1542, val_loss: 0.2622, val_mse: 81.5488, lr: 0.000080, 3.7s 2024-08-27 00:40:55 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [12/50] train_loss: 0.1188, val_loss: 0.2785, val_mse: 77.6908, lr: 0.000078, 3.7s 2024-08-27 00:40:58 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [13/50] train_loss: 0.0904, val_loss: 0.3050, val_mse: 77.7909, lr: 0.000076, 3.6s 2024-08-27 00:41:02 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [14/50] train_loss: 0.0953, val_loss: 0.2804, val_mse: 85.3211, lr: 0.000074, 3.6s 2024-08-27 00:41:06 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [15/50] train_loss: 0.0968, val_loss: 0.3863, val_mse: 100.0152, lr: 0.000072, 3.6s 2024-08-27 00:41:09 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [16/50] train_loss: 0.0836, val_loss: 0.2816, val_mse: 75.8753, lr: 0.000070, 3.6s 2024-08-27 00:41:13 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [17/50] train_loss: 0.0681, val_loss: 0.2604, val_mse: 69.4507, lr: 0.000068, 3.6s 2024-08-27 00:41:17 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [18/50] train_loss: 0.0630, val_loss: 0.2664, val_mse: 69.7275, lr: 0.000066, 3.6s 2024-08-27 00:41:21 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [19/50] train_loss: 0.0715, val_loss: 0.2976, val_mse: 77.2377, lr: 0.000064, 3.7s 2024-08-27 00:41:24 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [20/50] train_loss: 0.0826, val_loss: 0.2937, val_mse: 93.7388, lr: 0.000062, 3.7s 2024-08-27 00:41:28 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [21/50] train_loss: 0.0609, val_loss: 0.2625, val_mse: 70.0368, lr: 0.000060, 3.7s 2024-08-27 00:41:32 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [22/50] train_loss: 0.0476, val_loss: 0.2782, val_mse: 71.1232, lr: 0.000058, 3.7s 2024-08-27 00:41:35 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [23/50] train_loss: 0.0426, val_loss: 0.3187, val_mse: 81.6147, lr: 0.000056, 3.7s 2024-08-27 00:41:39 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [24/50] train_loss: 0.0585, val_loss: 0.2851, val_mse: 73.0689, lr: 0.000054, 3.6s 2024-08-27 00:41:42 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [25/50] train_loss: 0.0488, val_loss: 0.3019, val_mse: 85.4586, lr: 0.000052, 3.6s 2024-08-27 00:41:46 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [26/50] train_loss: 0.0395, val_loss: 0.2955, val_mse: 76.1862, lr: 0.000049, 3.7s 2024-08-27 00:41:50 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [27/50] train_loss: 0.0357, val_loss: 0.2734, val_mse: 69.7512, lr: 0.000047, 3.7s 2024-08-27 00:41:50 | unimol/utils/metrics.py | 228 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 27 2024-08-27 00:41:50 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-08-27 00:41:50 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 1, result {'mse': 69.45065, 'mae': 6.428352, 'pearsonr': 0.9274087118799992, 'spearmanr': 0.9266443569181034, 'r2': 0.8370101839776926} 2024-08-27 00:41:51 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2024-08-27 00:41:55 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/50] train_loss: 0.9447, val_loss: 0.5902, val_mse: 218.3698, lr: 0.000067, 3.7s 2024-08-27 00:41:59 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/50] train_loss: 0.5760, val_loss: 0.4121, val_mse: 151.5842, lr: 0.000099, 3.7s 2024-08-27 00:42:03 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/50] train_loss: 0.3440, val_loss: 0.3687, val_mse: 134.5263, lr: 0.000097, 3.7s 2024-08-27 00:42:07 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/50] train_loss: 0.3200, val_loss: 0.6363, val_mse: 228.5032, lr: 0.000095, 3.7s 2024-08-27 00:42:11 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/50] train_loss: 0.2795, val_loss: 0.2910, val_mse: 110.7638, lr: 0.000093, 3.7s 2024-08-27 00:42:15 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/50] train_loss: 0.2094, val_loss: 0.2801, val_mse: 103.9659, lr: 0.000091, 3.7s 2024-08-27 00:42:19 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/50] train_loss: 0.1796, val_loss: 0.3117, val_mse: 114.0413, lr: 0.000089, 3.6s 2024-08-27 00:42:22 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/50] train_loss: 0.1715, val_loss: 0.2455, val_mse: 95.3358, lr: 0.000087, 3.7s 2024-08-27 00:42:27 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/50] train_loss: 0.1465, val_loss: 0.2342, val_mse: 86.9313, lr: 0.000085, 3.6s 2024-08-27 00:42:31 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/50] train_loss: 0.1204, val_loss: 0.3174, val_mse: 126.4995, lr: 0.000082, 3.7s 2024-08-27 00:42:34 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [11/50] train_loss: 0.1650, val_loss: 0.2304, val_mse: 88.3319, lr: 0.000080, 3.6s 2024-08-27 00:42:38 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [12/50] train_loss: 0.1389, val_loss: 0.2396, val_mse: 87.3589, lr: 0.000078, 3.7s 2024-08-27 00:42:42 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [13/50] train_loss: 0.1251, val_loss: 0.2162, val_mse: 87.2219, lr: 0.000076, 3.7s 2024-08-27 00:42:45 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [14/50] train_loss: 0.1084, val_loss: 0.2783, val_mse: 113.2840, lr: 0.000074, 3.6s 2024-08-27 00:42:49 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [15/50] train_loss: 0.0885, val_loss: 0.2065, val_mse: 81.0023, lr: 0.000072, 3.7s 2024-08-27 00:42:53 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [16/50] train_loss: 0.0893, val_loss: 0.2201, val_mse: 87.4193, lr: 0.000070, 3.8s 2024-08-27 00:42:57 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [17/50] train_loss: 0.0779, val_loss: 0.2348, val_mse: 88.6517, lr: 0.000068, 3.7s 2024-08-27 00:43:00 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [18/50] train_loss: 0.0901, val_loss: 0.2829, val_mse: 111.5913, lr: 0.000066, 3.6s 2024-08-27 00:43:04 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [19/50] train_loss: 0.0673, val_loss: 0.2144, val_mse: 83.7624, lr: 0.000064, 3.8s 2024-08-27 00:43:08 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [20/50] train_loss: 0.0614, val_loss: 0.2596, val_mse: 102.2512, lr: 0.000062, 3.7s 2024-08-27 00:43:12 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [21/50] train_loss: 0.0608, val_loss: 0.2002, val_mse: 79.6439, lr: 0.000060, 3.7s 2024-08-27 00:43:16 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [22/50] train_loss: 0.0734, val_loss: 0.2417, val_mse: 98.4173, lr: 0.000058, 3.7s 2024-08-27 00:43:20 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [23/50] train_loss: 0.0496, val_loss: 0.2154, val_mse: 83.7296, lr: 0.000056, 3.7s 2024-08-27 00:43:23 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [24/50] train_loss: 0.0459, val_loss: 0.2076, val_mse: 78.3885, lr: 0.000054, 3.5s 2024-08-27 00:43:27 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [25/50] train_loss: 0.0368, val_loss: 0.1860, val_mse: 74.1272, lr: 0.000052, 3.7s 2024-08-27 00:43:31 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [26/50] train_loss: 0.0391, val_loss: 0.2081, val_mse: 81.9284, lr: 0.000049, 3.7s 2024-08-27 00:43:35 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [27/50] train_loss: 0.0447, val_loss: 0.2047, val_mse: 79.5948, lr: 0.000047, 3.7s 2024-08-27 00:43:39 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [28/50] train_loss: 0.0389, val_loss: 0.1937, val_mse: 76.9145, lr: 0.000045, 3.7s 2024-08-27 00:43:42 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [29/50] train_loss: 0.0390, val_loss: 0.2058, val_mse: 78.5970, lr: 0.000043, 3.7s 2024-08-27 00:43:46 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [30/50] train_loss: 0.0354, val_loss: 0.2083, val_mse: 79.9308, lr: 0.000041, 3.7s 2024-08-27 00:43:50 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [31/50] train_loss: 0.0319, val_loss: 0.1896, val_mse: 72.4210, lr: 0.000039, 3.8s 2024-08-27 00:43:54 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [32/50] train_loss: 0.0302, val_loss: 0.1948, val_mse: 74.6519, lr: 0.000037, 3.8s 2024-08-27 00:43:58 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [33/50] train_loss: 0.0317, val_loss: 0.2068, val_mse: 79.8340, lr: 0.000035, 3.7s 2024-08-27 00:44:01 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [34/50] train_loss: 0.0275, val_loss: 0.1724, val_mse: 67.5096, lr: 0.000033, 3.7s 2024-08-27 00:44:06 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [35/50] train_loss: 0.0233, val_loss: 0.1835, val_mse: 71.4496, lr: 0.000031, 3.8s 2024-08-27 00:44:09 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [36/50] train_loss: 0.0228, val_loss: 0.1864, val_mse: 74.6759, lr: 0.000029, 3.7s 2024-08-27 00:44:13 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [37/50] train_loss: 0.0215, val_loss: 0.1893, val_mse: 74.2092, lr: 0.000027, 3.7s 2024-08-27 00:44:17 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [38/50] train_loss: 0.0222, val_loss: 0.1943, val_mse: 76.2743, lr: 0.000025, 3.7s 2024-08-27 00:44:20 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [39/50] train_loss: 0.0189, val_loss: 0.1734, val_mse: 68.3891, lr: 0.000023, 3.7s 2024-08-27 00:44:24 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [40/50] train_loss: 0.0191, val_loss: 0.1864, val_mse: 72.9404, lr: 0.000021, 3.7s 2024-08-27 00:44:28 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [41/50] train_loss: 0.0210, val_loss: 0.1877, val_mse: 73.7779, lr: 0.000019, 3.7s 2024-08-27 00:44:31 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [42/50] train_loss: 0.0177, val_loss: 0.1934, val_mse: 76.8382, lr: 0.000016, 3.7s 2024-08-27 00:44:35 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [43/50] train_loss: 0.0170, val_loss: 0.1844, val_mse: 72.2434, lr: 0.000014, 3.7s 2024-08-27 00:44:39 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [44/50] train_loss: 0.0169, val_loss: 0.1875, val_mse: 73.1901, lr: 0.000012, 3.6s 2024-08-27 00:44:39 | unimol/utils/metrics.py | 228 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 44 2024-08-27 00:44:39 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-08-27 00:44:39 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 2, result {'mse': 67.50961, 'mae': 5.9036303, 'pearsonr': 0.8789863909408303, 'spearmanr': 0.8755947252532326, 'r2': 0.765735489034591} 2024-08-27 00:44:40 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2024-08-27 00:44:44 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/50] train_loss: 0.9389, val_loss: 0.7315, val_mse: 292.3936, lr: 0.000067, 3.6s 2024-08-27 00:44:47 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/50] train_loss: 0.4791, val_loss: 0.4673, val_mse: 177.1705, lr: 0.000099, 3.5s 2024-08-27 00:44:51 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/50] train_loss: 0.3456, val_loss: 0.2371, val_mse: 96.4608, lr: 0.000097, 3.6s 2024-08-27 00:44:55 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/50] train_loss: 0.2374, val_loss: 0.2688, val_mse: 109.0743, lr: 0.000095, 3.6s 2024-08-27 00:44:59 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/50] train_loss: 0.2632, val_loss: 0.4804, val_mse: 181.1487, lr: 0.000093, 3.6s 2024-08-27 00:45:03 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/50] train_loss: 0.2431, val_loss: 0.2602, val_mse: 101.1846, lr: 0.000091, 3.7s 2024-08-27 00:45:06 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/50] train_loss: 0.1731, val_loss: 0.1976, val_mse: 76.7388, lr: 0.000089, 3.7s 2024-08-27 00:45:10 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/50] train_loss: 0.1853, val_loss: 0.2459, val_mse: 92.6545, lr: 0.000087, 3.7s 2024-08-27 00:45:14 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/50] train_loss: 0.1531, val_loss: 0.2964, val_mse: 114.8402, lr: 0.000085, 3.7s 2024-08-27 00:45:18 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/50] train_loss: 0.1562, val_loss: 0.2732, val_mse: 105.2529, lr: 0.000082, 3.7s 2024-08-27 00:45:21 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [11/50] train_loss: 0.1568, val_loss: 0.2456, val_mse: 93.3396, lr: 0.000080, 3.7s 2024-08-27 00:45:25 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [12/50] train_loss: 0.1615, val_loss: 0.1986, val_mse: 76.8647, lr: 0.000078, 3.7s 2024-08-27 00:45:29 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [13/50] train_loss: 0.1055, val_loss: 0.1786, val_mse: 69.5149, lr: 0.000076, 3.7s 2024-08-27 00:45:33 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [14/50] train_loss: 0.0934, val_loss: 0.1944, val_mse: 75.1160, lr: 0.000074, 3.7s 2024-08-27 00:45:36 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [15/50] train_loss: 0.1041, val_loss: 0.1798, val_mse: 70.6135, lr: 0.000072, 3.6s 2024-08-27 00:45:40 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [16/50] train_loss: 0.0924, val_loss: 0.2172, val_mse: 86.2471, lr: 0.000070, 3.5s 2024-08-27 00:45:43 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [17/50] train_loss: 0.0748, val_loss: 0.2789, val_mse: 107.0924, lr: 0.000068, 3.6s 2024-08-27 00:45:47 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [18/50] train_loss: 0.0846, val_loss: 0.1862, val_mse: 73.9978, lr: 0.000066, 3.6s 2024-08-27 00:45:51 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [19/50] train_loss: 0.0782, val_loss: 0.2216, val_mse: 82.3670, lr: 0.000064, 3.7s 2024-08-27 00:45:54 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [20/50] train_loss: 0.0628, val_loss: 0.2210, val_mse: 85.1060, lr: 0.000062, 3.7s 2024-08-27 00:45:58 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [21/50] train_loss: 0.0553, val_loss: 0.2165, val_mse: 81.4355, lr: 0.000060, 3.6s 2024-08-27 00:46:02 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [22/50] train_loss: 0.0546, val_loss: 0.2286, val_mse: 89.6455, lr: 0.000058, 3.6s 2024-08-27 00:46:05 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [23/50] train_loss: 0.0632, val_loss: 0.3393, val_mse: 133.3820, lr: 0.000056, 3.7s 2024-08-27 00:46:05 | unimol/utils/metrics.py | 228 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 23 2024-08-27 00:46:05 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-08-27 00:46:06 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 3, result {'mse': 69.514885, 'mae': 6.1231885, 'pearsonr': 0.9165042951602115, 'spearmanr': 0.8885291931961063, 'r2': 0.8308993005669686} 2024-08-27 00:46:06 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2024-08-27 00:46:10 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [1/50] train_loss: 0.8959, val_loss: 0.5564, val_mse: 199.9423, lr: 0.000067, 3.8s 2024-08-27 00:46:15 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [2/50] train_loss: 0.4982, val_loss: 0.4616, val_mse: 155.0789, lr: 0.000099, 3.8s 2024-08-27 00:46:19 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [3/50] train_loss: 0.3485, val_loss: 0.2935, val_mse: 104.2189, lr: 0.000097, 3.7s 2024-08-27 00:46:23 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [4/50] train_loss: 0.2635, val_loss: 0.2953, val_mse: 109.1892, lr: 0.000095, 3.7s 2024-08-27 00:46:26 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [5/50] train_loss: 0.2401, val_loss: 0.5017, val_mse: 151.0450, lr: 0.000093, 3.7s 2024-08-27 00:46:30 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [6/50] train_loss: 0.2201, val_loss: 0.2745, val_mse: 102.6942, lr: 0.000091, 3.8s 2024-08-27 00:46:34 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [7/50] train_loss: 0.1822, val_loss: 0.3681, val_mse: 99.8628, lr: 0.000089, 3.9s 2024-08-27 00:46:39 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [8/50] train_loss: 0.1423, val_loss: 0.2064, val_mse: 66.6934, lr: 0.000087, 3.9s 2024-08-27 00:46:43 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [9/50] train_loss: 0.1755, val_loss: 0.2413, val_mse: 83.0962, lr: 0.000085, 3.8s 2024-08-27 00:46:46 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [10/50] train_loss: 0.1430, val_loss: 0.1812, val_mse: 66.1017, lr: 0.000082, 3.8s 2024-08-27 00:46:51 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [11/50] train_loss: 0.1490, val_loss: 0.3098, val_mse: 111.6609, lr: 0.000080, 3.8s 2024-08-27 00:46:54 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [12/50] train_loss: 0.1196, val_loss: 0.2648, val_mse: 79.4194, lr: 0.000078, 3.7s 2024-08-27 00:46:58 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [13/50] train_loss: 0.1204, val_loss: 0.2951, val_mse: 101.6178, lr: 0.000076, 3.6s 2024-08-27 00:47:02 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [14/50] train_loss: 0.1290, val_loss: 0.2126, val_mse: 76.2170, lr: 0.000074, 3.7s 2024-08-27 00:47:05 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [15/50] train_loss: 0.1106, val_loss: 0.2853, val_mse: 106.6059, lr: 0.000072, 3.6s 2024-08-27 00:47:09 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [16/50] train_loss: 0.0863, val_loss: 0.2096, val_mse: 70.7680, lr: 0.000070, 3.8s 2024-08-27 00:47:13 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [17/50] train_loss: 0.0734, val_loss: 0.2288, val_mse: 72.7970, lr: 0.000068, 3.8s 2024-08-27 00:47:17 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [18/50] train_loss: 0.0709, val_loss: 0.2481, val_mse: 90.5074, lr: 0.000066, 3.8s 2024-08-27 00:47:21 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [19/50] train_loss: 0.0729, val_loss: 0.2512, val_mse: 96.7210, lr: 0.000064, 3.8s 2024-08-27 00:47:24 | unimol/tasks/trainer.py | 169 | INFO | Uni-Mol(QSAR) | Epoch [20/50] train_loss: 0.0747, val_loss: 0.2534, val_mse: 79.7192, lr: 0.000062, 3.8s 2024-08-27 00:47:24 | unimol/utils/metrics.py | 228 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 20 2024-08-27 00:47:25 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-08-27 00:47:25 | unimol/models/nnmodel.py | 129 | INFO | Uni-Mol(QSAR) | fold 4, result {'mse': 66.10166, 'mae': 6.282986, 'pearsonr': 0.9170654557897265, 'spearmanr': 0.9023950947737711, 'r2': 0.8399314257477837} 2024-08-27 00:47:25 | unimol/models/nnmodel.py | 144 | INFO | Uni-Mol(QSAR) | Uni-Mol metrics score: {'mse': 69.18618807017577, 'mae': 6.288806721480499, 'pearsonr': 0.908591815385179, 'spearmanr': 0.8953624233549141, 'r2': 0.818854840030007} 2024-08-27 00:47:25 | unimol/models/nnmodel.py | 145 | INFO | Uni-Mol(QSAR) | Uni-Mol & Metric result saved!
2024-08-27 00:57:16 | unimol/data/conformer.py | 62 | INFO | Uni-Mol(QSAR) | Start generating conformers... 745it [00:02, 288.52it/s] 2024-08-27 00:57:18 | unimol/data/conformer.py | 66 | INFO | Uni-Mol(QSAR) | Failed to generate conformers for 0.00% of molecules. 2024-08-27 00:57:18 | unimol/data/conformer.py | 68 | INFO | Uni-Mol(QSAR) | Failed to generate 3d conformers for 0.00% of molecules. 2024-08-27 00:57:19 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2024-08-27 00:57:19 | unimol/models/nnmodel.py | 154 | INFO | Uni-Mol(QSAR) | start predict NNModel:unimolv1 2024-08-27 00:57:20 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-08-27 00:57:21 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-08-27 00:57:22 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-08-27 00:57:23 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-08-27 00:57:25 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-08-27 00:57:26 | unimol/predict.py | 66 | INFO | Uni-Mol(QSAR) | final predict metrics score: {'mse': 23.636309103815254, 'mae': 3.233425275365778, 'pearsonr': 0.9715887800423287, 'spearmanr': 0.9797724604404502, 'r2': 0.9423627791494944} 2024-08-27 00:57:26 | unimol/data/conformer.py | 62 | INFO | Uni-Mol(QSAR) | Start generating conformers... 186it [00:00, 260.81it/s] 2024-08-27 00:57:27 | unimol/data/conformer.py | 66 | INFO | Uni-Mol(QSAR) | Failed to generate conformers for 0.00% of molecules. 2024-08-27 00:57:27 | unimol/data/conformer.py | 68 | INFO | Uni-Mol(QSAR) | Failed to generate 3d conformers for 0.00% of molecules. 2024-08-27 00:57:28 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2024-08-27 00:57:28 | unimol/models/nnmodel.py | 154 | INFO | Uni-Mol(QSAR) | start predict NNModel:unimolv1 2024-08-27 00:57:28 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-08-27 00:57:29 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-08-27 00:57:29 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-08-27 00:57:30 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-08-27 00:57:30 | unimol/tasks/trainer.py | 213 | INFO | Uni-Mol(QSAR) | load model success! 2024-08-27 00:57:30 | unimol/predict.py | 66 | INFO | Uni-Mol(QSAR) | final predict metrics score: {'mse': 67.58912567075372, 'mae': 5.934177752708996, 'pearsonr': 0.9143943848268096, 'spearmanr': 0.9148695617738627, 'r2': 0.8355427630018085} [Uni-Mol] MSE:67.5891
结果总览
最后,我们可以横向比较一下1D-QSAR, 2D-QSAR, 3D-QSAR和不同的模型组合,以及Uni-Mol在同一个数据集上的预测表现。
MSE | error | |
---|---|---|
Uni-Mol | 67.589126 | [1.4347641754150389, 5.46052307128906, 2.79029... |
META-K-Nearest Neighbors | 117.586869 | [0.5159999999999996, 9.784000000000006, 1.4319... |
META-XGBoost | 124.539062 | [0.08620598793029766, 8.571835937499998, 0.191... |
Top5_Meta | 124.571358 | [0.6811066381779587, 8.628646193471205, 1.5170... |
META-Gradient Boosting | 129.000935 | [0.829192415780553, 8.881948341419132, 1.74654... |
META-Random Forest | 131.602186 | [0.25830000000000064, 7.406816666666664, 0.975... |
META-ElasticNet Regression | 132.327727 | [1.7158347871789421, 8.498630021770225, 3.2397... |
META-Lasso Regression | 132.40948 | [1.4709553609447004, 7.940783905566981, 3.3159... |
2D-QSAR-XGBoost | 134.418465 | [2.2681606101989744, 8.300439300537107, 4.9414... |
META-Ridge Regression | 136.284468 | [2.1172407427238196, 8.581391454504171, 2.9702... |
META-Linear Regression | 136.290314 | [2.1174870964079724, 8.580701661409655, 2.9699... |
META-Decision Tree | 139.116705 | [0.0, 9.349999999999994, 0.0, 0.0, 3.420000000... |
2D-QSAR-Ridge Regression | 146.424461 | [5.3536414252557325, 5.721553409130237, 6.7322... |
2D-QSAR-Random Forest | 147.623964 | [5.441400000000012, 14.556749999999994, 8.0977... |
2D-QSAR-Gradient Boosting | 164.313185 | [9.087225250360694, 12.00466684034319, 15.5987... |
META-Support Vector | 165.375313 | [2.2348044247247736, 9.848231713468863, 3.9497... |
1D-QSAR-Random Forest | 180.932561 | [4.180580000000008, 3.8885173809523863, 3.9988... |
2D-QSAR-K-Nearest Neighbors | 187.541235 | [7.565999999999999, 16.536, 24.302, 21.058, 11... |
3D-QSAR-Gradient Boosting | 196.913228 | [4.384201056243583, 5.61871312843914, 26.41189... |
1D-QSAR-K-Nearest Neighbors | 201.740644 | [6.772, 15.943999999999999, 0.0359999999999871... |
1D-QSAR-Gradient Boosting | 203.712632 | [4.424342556469414, 0.6585678523793561, 12.570... |
1D-QSAR-Decision Tree | 209.032574 | [0.0, 7.68, 0.0, 0.0, 3.4200000000000017, 3.67... |
1D-QSAR-XGBoost | 211.704988 | [0.24724909782409688, 4.3572488403320335, 0.08... |
2D-QSAR-Decision Tree | 217.870475 | [0.0, 31.209999999999997, 0.0, 0.0, 3.42000000... |
3D-QSAR-Random Forest | 234.637649 | [18.277199999999993, 9.874500000000012, 36.358... |
1D-QSAR-Lasso Regression | 243.187292 | [7.066519115384862, 0.9574225173103557, 24.220... |
1D-QSAR-Ridge Regression | 245.046642 | [8.13474528193994, 2.5750994446713236, 21.2488... |
1D-QSAR-ElasticNet Regression | 245.410252 | [8.782313994351638, 4.186733303955677, 24.3090... |
1D-QSAR-Linear Regression | 246.557773 | [8.155818147988127, 3.0879337099640054, 19.348... |
3D-QSAR-K-Nearest Neighbors | 261.194916 | [25.326000000000004, 6.207999999999991, 14.956... |
3D-QSAR-XGBoost | 273.190968 | [26.69975212097168, 19.26223892211914, 31.3196... |
3D-QSAR-Ridge Regression | 287.186349 | [15.047548675691063, 17.162533867863583, 28.14... |
3D-QSAR-Support Vector | 287.464921 | [19.566274795504764, 19.61772059947296, 20.207... |
2D-QSAR-Lasso Regression | 323.758552 | [24.157158624743346, 13.564996021738438, 21.77... |
2D-QSAR-Support Vector | 331.726256 | [25.727988127099145, 13.130659643063797, 19.55... |
1D-QSAR-Support Vector | 343.46105 | [31.21658125407704, 16.634806362424133, 17.494... |
2D-QSAR-ElasticNet Regression | 357.567552 | [25.715824138912236, 14.915624341914732, 22.46... |
3D-QSAR-ElasticNet Regression | 425.983457 | [26.925234899328856, 14.794765100671142, 24.79... |
3D-QSAR-Lasso Regression | 425.983457 | [26.925234899328856, 14.794765100671142, 24.79... |
3D-QSAR-Decision Tree | 523.690919 | [26.63, 27.88, 23.99, 0.0, 0.0, 47.28, 27.7399... |