定量构效关系(QSAR)模型从0到1 & Uni-Mol入门实践
©️ Copyright 2023 @ Authors
作者:
郑行 📨
日期:2023-06-06
共享协议:本作品采用知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议进行许可。
快速开始:点击上方的 开始连接 按钮,选择 unimol-qsar:0703镜像及任意GPU节点配置,稍等片刻即可运行。
近年来,人工智能(AI)正以前所未有的速度发展,为各个领域带来巨大的突破和变革。
而实际上,在药物研发领域,药物科学家从上世纪就开始运用一系列数学和统计方法来助力药物研发的流程。他们基于药物分子的结构,构建数学模型,用以预测药物的生化活性,这种方法被称为定量构效关系(Quantitative Structure-Activity Relationship,QSAR)。QSAR模型也随着人们对药物分子研究的不断深入,以及更多的人工智能方法被提出而持续发展。
可以说,QSAR模型是一个很好的观察AI for Science领域发展的缩影。在这个Notebook中,我们将以案例的形式介绍不同类型的QSAR模型的构建方法。
引言
定量构效关系(Quantitative Structure-Activity Relationship,QSAR)是一种研究化合物的化学结构与生物活性之间定量关系的方法,是计算机辅助药物设计(Computer-Aided Drug Design, CADD)中最为重要的工具之一。QSAR旨在建立数学模型,构建分子结构与其生化、物化性质关系,帮助药物科学家对新的药物分子的性质开展合理预测。
构建一个有效的QSAR模型涉及到若干步骤:
- 构建合理的分子表征(Molecular Representation),将分子结构转化为计算机可读的数值表示;
- 选择适合分子表征的机器学习模型,并使用已有的分子-性质数据训练模型;
- 使用训练好的机器学习模型,对未测定性质的分子进行性质预测。
QSAR模型的发展也正是随着分子表征的演进,以及对应机器学习模型的升级而不断变化。 在这个Notebook中,我们将以案例的形式介绍不同类型的QSAR模型的构建方法。
先准备一些数据吧!
为了带领大家更好地学习和体验构建QSAR模型的过程,我们将使用BACE-1靶点分子活性预测任务来作为演示案例。
我们可以首先下载BACE-1数据集:
--2023-06-12 14:56:05-- https://bohrium-example.oss-cn-zhangjiakou.aliyuncs.com/notebook/bace_classification/train.csv Resolving ga.dp.tech (ga.dp.tech)... 10.255.255.41 Connecting to ga.dp.tech (ga.dp.tech)|10.255.255.41|:8118... connected. Proxy request sent, awaiting response... 200 OK Length: 81046 (79K) [text/csv] Saving to: ‘./datasets/BACE_train.csv’ ./datasets/BACE_tra 100%[===================>] 79.15K --.-KB/s in 0.04s 2023-06-12 14:56:06 (1.74 MB/s) - ‘./datasets/BACE_train.csv’ saved [81046/81046] --2023-06-12 14:56:06-- https://bohrium-example.oss-cn-zhangjiakou.aliyuncs.com/notebook/bace_classification/test.csv Resolving ga.dp.tech (ga.dp.tech)... 10.255.255.41 Connecting to ga.dp.tech (ga.dp.tech)|10.255.255.41|:8118... connected. Proxy request sent, awaiting response... 200 OK Length: 22128 (22K) [text/csv] Saving to: ‘./datasets/BACE_test.csv’ ./datasets/BACE_tes 100%[===================>] 21.61K --.-KB/s in 0.04s 2023-06-12 14:56:06 (540 KB/s) - ‘./datasets/BACE_test.csv’ saved [22128/22128]
然后,我们可以观察一下这个数据集的组成:
------ train data ------ SMILES TARGET 0 CN1C(=O)[C@@](c2ccc(OC(F)F)cc2)(c2ccc(F)c(C#CC... 1 1 CN1C(=O)[C@@](c2ccc(OC(F)F)cc2)(c2cccc(C#CCF)c... 1 2 CN1C(=O)[C@@](c2ccc(OC(F)F)cc2)(c2cccc(C#CCCF)... 1 3 CN1C(=O)[C@@](c2ccc(OC(F)F)cc2)(c2cccc(C#CCCO)... 1 4 CN1C(=O)[C@@](c2ccc(OC(F)F)cc2)(c2cccc(C#CCCCC... 1 ... ... ... 1205 CN1C(=O)C(c2ccncc2)(c2cccc(-c3ccoc3)c2)N=C1N 0 1206 CC(=O)N[C@@H](Cc1cc(F)cc(F)c1)[C@H](O)C[NH2+]C... 0 1207 O=C(C1C[NH2+]CC1c1ccccc1Br)N1CCC(c2ccccc2)CC1c... 0 1208 Nc1cccc(Cn2c(-c3ccc(Oc4ccncc4)cc3)ccc2-c2ccccc... 0 1209 CC1(C)Cc2cc(Cl)ccc2C(NC(Cc2cscc2CCC2CC2)C(=O)[... 0 [1210 rows x 2 columns] POSITIVE: 515 NEGETIVE: 695 ------ test data ------ SMILES TARGET 0 Cc1ccccc1-c1ccc2nc(N)c(CC(C)C(=O)NCC3CCOCC3)cc2c1 1 1 CC(C)(C)Cc1cnc2c(c1)C([NH2+]CC(O)C1Cc3cccc(c3)... 1 2 COCC(=O)NC(Cc1cccc(-c2ncco2)c1)C(O)C[NH2+]C1CC... 1 3 COCC(=O)NC(Cc1cccc(-c2ccccn2)c1)C(O)C[NH2+]C1C... 1 4 CCn1cc2c3c(cc(C(=O)NC(Cc4ccccc4)[C@H](O)C[NH2+... 1 .. ... ... 298 CCn1cc2c3c(cc(C(=O)NC(Cc4ccccc4)C(=O)C[NH2+]C4... 1 299 CN1CCC(C)(c2cc(NC(=O)c3ccc(Cl)cn3)ccc2F)N=C1N 1 300 COc1cccc(C[NH2+]C[C@H](O)[C@H](Cc2cc(F)cc(F)c2... 1 301 CC(C)(C)c1cccc(C[NH2+]C2CS(=O)(=O)CC(Cc3cc(F)c... 1 302 COc1cccc(C[NH2+]CC(O)C(Cc2ccccc2)NC(=O)C(CCc2c... 1 [303 rows x 2 columns] POSITIVE: 176 NEGETIVE: 127
可以看到,在BACE数据集里:
- 分子用SMILES字符串表示;
- 任务目标是一个二分类任务,其中TARGET==1代表分子对BACE-1靶点具有活性,TARGET==0代表分子对BACE-1靶点没有活性。
这是一个常见的分子性质预测任务。好的,先把这个数据集放一边。接下来,让我们正式进入探索
QSAR的简明历史
定量构效关系(Quantitative Structure-Activity Relationship,QSAR)是一种研究化合物的化学结构与生物活性之间定量关系的方法,是计算机辅助药物设计(Computer-Aided Drug Design, CADD)中最为重要的工具之一。QSAR旨在建立数学模型,构建分子结构与其生化、物化性质关系,帮助药物科学家对新的药物分子的性质开展合理预测。
QSAR是由构效关系(Structure-Activity Relationship,SAR)分析发展而来的。SAR的起源可以追溯到19世纪末,当时化学家们开始研究化合物的结构与生物活性之间的关系,德国化学家Paul Ehrlich(1854-1915),他提出了“锁-钥”假说,即化合物(钥匙)与生物靶标(锁)之间的相互作用取决于它们的空间匹配度。随着科学家对分子间相互作用的深入理解,大家发现除了空间匹配外,靶点表面性质(例如疏水性、亲电性)与配体对应结构的性质相互匹配也至关重要,于是发展了一系列评价结构特性与结合亲和力的分析方法,即构效关系。
然而,SAR方法主要依赖于化学家的经验和直观判断,缺乏严密的理论基础和统一的分析方法。为了克服这些局限性,20世纪60年代,科学家们开始尝试使用数学和统计方法对分子结构与生物活性之间的关系进行定量分析。
最早提出的QSAR模型可以追溯到1868年,化学家Alexander Crum Brown和生理学家Thomas R. Fraser开始研究化合物结构与生物活性之间的关系。在研究生物碱的碱性N原子甲基化前后的生物效应时,他们提出化合物的生理活性依赖于其组分的构成,即生物活性是化合物组成的函数:,这就是著名的Crum-Brown方程。这一假设为后来的QSAR研究奠定了基础。
随后,不断有QSAR模型在学界被提出,例如Hammett提出的有机物毒性与分子电性的QSAR模型、Taft提出的立体参数模型。1964年,Hansch和Fujita提出了著名的Hansch模型,指出分子的生物活性主要是由其疏水效应()、立体效应()和静电效应()决定的,并假设这三种效应彼此可以独立相加,其完整形式为:。Hansch模型首次将化学信息与药物生物活性之间的关系进行了定量化描述,为后续的QSAR研究提供了一个实用的理论框架,被认为是从盲目药物设计过渡到合理药物设计的重要标志。
时至今日,QSAR已经发展成为一个成熟的研究领域,涉及多种计算方法和技术。近年来,随着机器学习和人工智能技术的快速发展,QSAR方法得到了进一步的拓展和应用。例如,深度学习技术被应用于QSAR模型的构建,提高了模型的预测能力和准确性。此外,QSAR方法在环境科学、材料科学等领域也取得了广泛的应用,显示出其强大的潜力和广泛的应用前景。
QSAR建模的基本要求
2002年在葡萄牙的Setubal召开的一次国际会议上,与会的科学工作者们提出了关于QSAR模型有效性的几条规则,被称为“Setubal Principles”,这些规则在 2004年11月得到了进一步详细的修正,并被正式命名为“OECD Principles”。规定一个QSAR模型要达到调控目的(regulatory purpose),应该满足以下5个条件:
- a defined endpoint(明确目标)
- an unambiguous algorithm(明确算法)
- a defined domain of applicability(明确的使用范围)
- appropriate measures of goodness-of-fit, robustness and predictivity(稳定)
- a mechanistic interpretation, if possible(如果可能的话,可解释)
QSAR建模的基本流程
构建一个有效的QSAR模型主要有三步:
- 构建合理的分子表征(Molecular Representation),将分子结构转化为计算机可读的数值表示;
- 选择适合分子表征的机器学习模型,并使用已有的分子-性质数据训练模型;
- 使用训练好的机器学习模型,对未测定性质的分子进行性质预测。
由于分子结构并不是一个计算机可读的格式,因此我们首先要将分子结构转化为计算机可读的数值向量,才能基于其选择合适的数学模型。我们把这个过程称为分子表示(molecular representation)。有效的分子表示以及匹配的数学模型选择是构建定量构效关系模型的核心。
分子表示
分子表示是包含分子属性的数值表示。例如我们常见的分子描述符(Descriptor)、分子指纹(Fingerprints)、SMILES字符串、分子势函数等都是常见的分子表示方法。
Wei, J., Chu, X., Sun, X. Y., Xu, K., Deng, H. X., Chen, J., ... & Lei, M. (2019). Machine learning in materials science. InfoMat, 1(3), 338-358.
事实上,QSAR的发展也正是随着分子表示包含的信息不断增多、分子表示的形式不断变化而产生分类,常见的QSAR模型可以分为1D-QSAR、2D-QSAR、3D-QSAR:
不同的分子表示有不同的数值特点,因此也要选用不同的机器学习/深度学习模型进行建模。接下来,我们就将以实际案例给大家展示如何构建1D-QSAR, 2D-QSAR, 3D-QSAR模型。
1D-QSAR分子表征
早期的定量构效关系模型大多以分子量、水溶性、分子表面积等分子的物化性质作为表征的方法,我们称这些分子的物化性质为分子描述符(Descriptor)。这就是1D-QSAR的阶段。
这个阶段往往需要富有经验的科学家基于自己的领域知识,来进行分子描述符的设计,去构建一些可能和这个性质相关的一些分子性质。例如假设要预测某个药物是否能通过血脑屏障,那这个性质可能和药物分子的水溶性、分子量、极性表面积等物化属性相关,科学家就要把这样的属性加入到分子描述符中。
这个阶段由于计算机尚未普及,或算力不足,科学家往往通过一些简单的数学模型进行建模,例如线性回归、随机森林等方法。当然了,由于通过分子描述符构建的分子表示通常是低维的实值向量,这些数学模型也很适合做这样的工作。
[433.40500000000026, 3.558600000000002, 1, 4, 67.92, 6, 2, 1, 0, 9, 162, 0, 0.4304551686487377]
/opt/conda/lib/python3.8/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead. from pandas import MultiIndex, Int64Index /opt/conda/lib/python3.8/site-packages/sklearn/linear_model/_logistic.py:763: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [1D-QSAR][Logistic Regression] ACC:0.5644 AUC:0.6396 [1D-QSAR][Stochastic Gradient Descent] ACC:0.4191 AUC:0.5028 [1D-QSAR][K-Nearest Neighbors] ACC:0.6040 AUC:0.6287 [1D-QSAR][Bernoulli Naive Bayes] ACC:0.4092 AUC:0.4571 [1D-QSAR][Decision Tree] ACC:0.5842 AUC:0.5964 [1D-QSAR][Random Forest] ACC:0.6172 AUC:0.6790 [1D-QSAR][XGBoost] ACC:0.6502 AUC:0.6832 [1D-QSAR][Multi-layer Perceptron] ACC:0.5710 AUC:0.6488
2D-QSAR分子表征
然而,面临一些生化机制尚不清晰的分子性质预测问题时,科学家可能很难设计出有效的分子描述符来表征分子,导致QSAR模型构建的失败。由于分子性质很大程度由分子结构决定,例如分子上有什么官能团,因此人们想把分子的键连关系引入到QSAR建模中。于是领域进入了2D-QSAR的阶段。
较早被提出的是Morgan指纹等通过遍历分子中每个原子与周边原子的键连关系来表征的分子指纹方法。为了满足不同大小的分子能用相同长度的数值向量来表征的要求,分子指纹往往会通过hash的操作来保证向量长度的统一,因此分子指纹往往是高维的0/1向量。在这个场景下,科学家通常会选择例如支持向量机,以及全连接神经网络等对高维稀疏向量有较好处理能力的机器学习方法来进行模型构建。
随着AI模型的发展,能处理序列数据(例如文本)的循环神经网络(Recurrent neural network, RNN)、能处理图片数据的卷积神经网络(convolutional neural network, CNN)、能处理非结构化的图数据的图神经网络(graph neural network, GNN)等深度学习模型不断被提出和应用,QSAR模型也根据这些模型能处理的数据特点,构建了适配的分子表示。例如人们将分子的SMILES字符表示应用RNN建模,将分子的二维图像应用CNN建模,将分子的键连拓扑结构转化成图应用GNN建模发展了一系列的QSAR建模方法。
但是总的来说,在2D-QSAR阶段中,人们在利用各类方法去解析分子的键连关系(拓扑结构)来进行分子性质的建模预测。
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[2D-QSAR][Logistic Regression] ACC:0.6799 AUC:0.7582 /opt/conda/lib/python3.8/site-packages/sklearn/linear_model/_logistic.py:763: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [2D-QSAR][Stochastic Gradient Descent] ACC:0.6865 AUC:0.7424 [2D-QSAR][K-Nearest Neighbors] ACC:0.7162 AUC:0.7419 [2D-QSAR][Bernoulli Naive Bayes] ACC:0.6568 AUC:0.7329 [2D-QSAR][Decision Tree] ACC:0.6139 AUC:0.6369 [2D-QSAR][Random Forest] ACC:0.6601 AUC:0.7660 [2D-QSAR][XGBoost] ACC:0.6733 AUC:0.7529 [2D-QSAR][Multi-layer Perceptron] ACC:0.6601 AUC:0.7138
3D-QSAR分子表征
然而,由于分子间、分子内相互作用的存在,拓扑结构相近的分子在各个不同环境下会采取的构象不尽相同。而每个分子在不同环境下的构象以及对应的能量高低决定了分子的真实性质。因此,科学家期望将分子的三维结构引入到QSAR建模里去,来增强对特定场景的分子性质预测能力。这个阶段被称为3D-QSAR阶段。
分子比较场方法(CoFMA)是被广泛应用的3D-QSAR模型。它计算分子存在的空间中各个位置(通常通过格点法进行位置的选取)所受到的力的作用(也就是力场)来表征分子的三维结构。当然,领域中还有一些有益的尝试,包括通过电子密度、分子三维图像等表征方法,或是在分子图上加入几何信息。
而要处理这样的高维空间信息,科学家们往往会选择例如较深的FCNN、3D-CNN、GNN等深度学习方法来进行建模。
10000
/opt/conda/lib/python3.8/site-packages/sklearn/linear_model/_logistic.py:763: ConvergenceWarning: lbfgs failed to converge (status=2): ABNORMAL_TERMINATION_IN_LNSRCH. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [3D-QSAR][Logistic Regression] ACC:0.4191 AUC:0.5000 /opt/conda/lib/python3.8/site-packages/sklearn/linear_model/_stochastic_gradient.py:574: ConvergenceWarning: Maximum number of iteration reached before convergence. Consider increasing max_iter to improve the fit. warnings.warn("Maximum number of iteration reached before " [3D-QSAR][Stochastic Gradient Descent] ACC:0.3762 AUC:0.3785 [3D-QSAR][K-Nearest Neighbors] ACC:0.6040 AUC:0.7008 [3D-QSAR][Bernoulli Naive Bayes] ACC:0.6568 AUC:0.6658 [3D-QSAR][Decision Tree] ACC:0.5974 AUC:0.5906 [3D-QSAR][Random Forest] ACC:0.6601 AUC:0.7395 [3D-QSAR][XGBoost] ACC:0.6436 AUC:0.7290 [3D-QSAR][Multi-layer Perceptron] ACC:0.6304 AUC:0.6524
Uni-Mol 分子表示学习和预训练框架
预训练模型
在药物研发领域中,QSAR建模面临的一个主要挑战是数据量有限。由于药物活性数据的获取成本高且实验难度大,这导致了标签数据不足的情况。数据量不足会影响模型的预测能力,因为模型可能难以捕捉到足够的信息来描述化合物结构和生物活性之间的关系。
面临这种有标签数据不足的情况,在机器学习发展地更为成熟的领域,例如自然语言处理(NLP)和计算机视觉(CV)中,预训练-微调(Pretrain-Finetune)模式已经成为了通用的解决方案。预训练是指在大量无标签数据对模型通过自监督学习进行预先训练,使模型获得一些基本信息和通用能力,然后再在有限的有标签数据上进行监督学习来微调模型,使模型在具体问题上具备特定问题的推理能力。
例如,我想进行猫狗的图片识别,但是我没有很多猫狗的有标签数据。于是我可以先用大量的没有标签的图片预训练模型,先让模型学到点线面轮廓的基本知识,然后再把猫狗图片给模型做有监督训练,这时候,模型可能就能基于轮廓信息,快速学习到什么是猫什么是狗的信息了。
预训练方法可以充分利用大量容易获取的无标签数据的信息,提高模型的泛化能力和预测性能。在QSAR建模中,我们同样可以借鉴预训练的思想来解决数据数量和数据质量问题。
Uni-Mol 简介
Uni-Mol是深势科技于2022年5月发布的一款基于分子三维结构的通用分子表征学习框架。Uni-Mol将分子三维结构作为模型输入,并使用约2亿个小分子构象和300万个蛋白表面空腔结构,使用原子类型还原和原子坐标还原两种自监督任务对模型进行预训练。
Uni-Mol 论文:https://openreview.net/forum?id=6K2RM6wVqKu
开源代码:https://github.com/dptech-corp/Uni-Mol
从三维信息出发的表征学习和有效的预训练方案让 Uni-Mol 在几乎所有与药物分子和蛋白口袋相关的下游任务上都超越了 SOTA(state of the art),也让 Uni-Mol 得以能够直接完成分子构象生成、蛋白-配体结合构象预测等三维构象生成相关的任务,并超越现有解决方案。论文被机器学习顶会ICLR 2023接收。
接下来,我们要使用Uni-Mol来完成BACE-1分子活性预测任务的构建:
2023-06-12 14:58:27 | unimol/data/conformer.py | 56 | INFO | Uni-Mol(QSAR) | Start generating conformers... 1210it [00:35, 34.35it/s] 2023-06-12 14:59:02 | unimol/data/conformer.py | 60 | INFO | Uni-Mol(QSAR) | Failed to generate conformers for 0.00% of molecules. 2023-06-12 14:59:02 | unimol/data/conformer.py | 62 | INFO | Uni-Mol(QSAR) | Failed to generate 3d conformers for 0.00% of molecules. 2023-06-12 14:59:02 | unimol/train.py | 83 | INFO | Uni-Mol(QSAR) | Create output directory: ./exp 2023-06-12 14:59:03 | unimol/models/unimol.py | 107 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2023-06-12 14:59:03 | unimol/models/nnmodel.py | 100 | INFO | Uni-Mol(QSAR) | start training Uni-Mol:unimolv1 2023-06-12 14:59:11 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [1/100] train_loss: 0.6934, val_loss: 0.6591, val_auc: 0.7798, lr: 0.000033, 8.1s 2023-06-12 14:59:14 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [2/100] train_loss: 0.6685, val_loss: 0.6070, val_auc: 0.7907, lr: 0.000067, 2.0s 2023-06-12 14:59:16 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [3/100] train_loss: 0.6200, val_loss: 0.5608, val_auc: 0.8223, lr: 0.000100, 1.9s 2023-06-12 14:59:18 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [4/100] train_loss: 0.6133, val_loss: 0.5103, val_auc: 0.8321, lr: 0.000099, 1.9s 2023-06-12 14:59:21 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [5/100] train_loss: 0.5343, val_loss: 0.5365, val_auc: 0.8518, lr: 0.000098, 1.9s 2023-06-12 14:59:23 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [6/100] train_loss: 0.5737, val_loss: 0.4248, val_auc: 0.8910, lr: 0.000097, 1.9s 2023-06-12 14:59:25 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [7/100] train_loss: 0.5288, val_loss: 0.7240, val_auc: 0.8826, lr: 0.000096, 1.9s 2023-06-12 14:59:27 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [8/100] train_loss: 0.5405, val_loss: 0.6963, val_auc: 0.8915, lr: 0.000095, 1.9s 2023-06-12 14:59:29 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [9/100] train_loss: 0.5392, val_loss: 0.3636, val_auc: 0.9043, lr: 0.000094, 1.9s 2023-06-12 14:59:32 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [10/100] train_loss: 0.4900, val_loss: 0.4316, val_auc: 0.9030, lr: 0.000093, 1.9s 2023-06-12 14:59:34 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [11/100] train_loss: 0.4650, val_loss: 0.5030, val_auc: 0.9011, lr: 0.000092, 1.9s 2023-06-12 14:59:36 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [12/100] train_loss: 0.4727, val_loss: 0.4408, val_auc: 0.8966, lr: 0.000091, 1.9s 2023-06-12 14:59:37 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [13/100] train_loss: 0.4607, val_loss: 0.4568, val_auc: 0.8924, lr: 0.000090, 1.9s 2023-06-12 14:59:39 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [14/100] train_loss: 0.4705, val_loss: 0.4173, val_auc: 0.9099, lr: 0.000089, 1.9s 2023-06-12 14:59:42 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [15/100] train_loss: 0.4242, val_loss: 0.4201, val_auc: 0.8996, lr: 0.000088, 1.9s 2023-06-12 14:59:44 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [16/100] train_loss: 0.4649, val_loss: 0.5397, val_auc: 0.8761, lr: 0.000087, 1.9s 2023-06-12 14:59:45 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [17/100] train_loss: 0.4237, val_loss: 0.3735, val_auc: 0.9057, lr: 0.000086, 1.9s 2023-06-12 14:59:47 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [18/100] train_loss: 0.4233, val_loss: 0.4938, val_auc: 0.8911, lr: 0.000085, 1.9s 2023-06-12 14:59:49 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [19/100] train_loss: 0.3827, val_loss: 0.3809, val_auc: 0.9090, lr: 0.000084, 1.9s 2023-06-12 14:59:51 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [20/100] train_loss: 0.3852, val_loss: 0.4536, val_auc: 0.9067, lr: 0.000082, 1.9s 2023-06-12 14:59:53 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [21/100] train_loss: 0.3700, val_loss: 0.4139, val_auc: 0.9074, lr: 0.000081, 1.9s 2023-06-12 14:59:55 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [22/100] train_loss: 0.4123, val_loss: 0.5209, val_auc: 0.9114, lr: 0.000080, 1.9s 2023-06-12 14:59:57 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [23/100] train_loss: 0.4269, val_loss: 0.4196, val_auc: 0.9045, lr: 0.000079, 1.9s 2023-06-12 14:59:59 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [24/100] train_loss: 0.3450, val_loss: 0.5998, val_auc: 0.8819, lr: 0.000078, 1.9s 2023-06-12 15:00:01 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [25/100] train_loss: 0.3624, val_loss: 0.4902, val_auc: 0.8987, lr: 0.000077, 1.9s 2023-06-12 15:00:03 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [26/100] train_loss: 0.3401, val_loss: 0.4433, val_auc: 0.9182, lr: 0.000076, 1.9s 2023-06-12 15:00:05 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [27/100] train_loss: 0.3394, val_loss: 0.7007, val_auc: 0.8884, lr: 0.000075, 1.9s 2023-06-12 15:00:07 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [28/100] train_loss: 0.3713, val_loss: 0.4511, val_auc: 0.9146, lr: 0.000074, 1.9s 2023-06-12 15:00:09 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [29/100] train_loss: 0.3090, val_loss: 0.5179, val_auc: 0.8955, lr: 0.000073, 1.9s 2023-06-12 15:00:11 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [30/100] train_loss: 0.3270, val_loss: 0.5155, val_auc: 0.9096, lr: 0.000072, 1.9s 2023-06-12 15:00:13 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [31/100] train_loss: 0.3277, val_loss: 0.5017, val_auc: 0.9084, lr: 0.000071, 1.9s 2023-06-12 15:00:15 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [32/100] train_loss: 0.3068, val_loss: 0.5567, val_auc: 0.9081, lr: 0.000070, 1.9s 2023-06-12 15:00:17 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [33/100] train_loss: 0.2974, val_loss: 0.6391, val_auc: 0.8909, lr: 0.000069, 1.9s 2023-06-12 15:00:19 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [34/100] train_loss: 0.3055, val_loss: 0.5395, val_auc: 0.9024, lr: 0.000068, 1.9s 2023-06-12 15:00:20 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [35/100] train_loss: 0.3441, val_loss: 0.7071, val_auc: 0.8945, lr: 0.000067, 1.9s 2023-06-12 15:00:22 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [36/100] train_loss: 0.3541, val_loss: 0.5885, val_auc: 0.8973, lr: 0.000066, 1.9s 2023-06-12 15:00:22 | unimol/utils/metrics.py | 270 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 36 2023-06-12 15:00:23 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success! 2023-06-12 15:00:23 | unimol/models/nnmodel.py | 123 | INFO | Uni-Mol(QSAR) | fold 0, result {'auc': 0.9182072829131652, 'auroc': 0.9182072829131652, 'auprc': 0.8699326788267808, 'log_loss': 0.45367900360845154, 'acc': 0.8429752066115702, 'f1_score': 0.831858407079646, 'mcc': 0.698723618744602, 'precision': 0.7580645161290323, 'recall': 0.9215686274509803, 'cohen_kappa': 0.6871683222207103, 'f1_bst': 0.831858407079646, 'acc_bst': 0.8429752066115702} 2023-06-12 15:00:24 | unimol/models/unimol.py | 107 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2023-06-12 15:00:26 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [1/100] train_loss: 0.6955, val_loss: 0.6758, val_auc: 0.6865, lr: 0.000033, 1.9s 2023-06-12 15:00:28 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [2/100] train_loss: 0.6846, val_loss: 0.7654, val_auc: 0.7665, lr: 0.000067, 1.9s 2023-06-12 15:00:30 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [3/100] train_loss: 0.6799, val_loss: 0.6400, val_auc: 0.7454, lr: 0.000100, 1.9s 2023-06-12 15:00:32 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [4/100] train_loss: 0.6598, val_loss: 0.8696, val_auc: 0.7462, lr: 0.000099, 1.9s 2023-06-12 15:00:34 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [5/100] train_loss: 0.6260, val_loss: 0.5507, val_auc: 0.8060, lr: 0.000098, 1.9s 2023-06-12 15:00:36 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [6/100] train_loss: 0.5894, val_loss: 0.5587, val_auc: 0.7959, lr: 0.000097, 1.9s 2023-06-12 15:00:38 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [7/100] train_loss: 0.5780, val_loss: 0.5275, val_auc: 0.8124, lr: 0.000096, 1.9s 2023-06-12 15:00:40 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [8/100] train_loss: 0.5666, val_loss: 0.5446, val_auc: 0.8516, lr: 0.000095, 1.9s 2023-06-12 15:00:43 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [9/100] train_loss: 0.5667, val_loss: 0.4747, val_auc: 0.8495, lr: 0.000094, 1.9s 2023-06-12 15:00:45 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [10/100] train_loss: 0.5405, val_loss: 0.4588, val_auc: 0.8685, lr: 0.000093, 1.9s 2023-06-12 15:00:47 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [11/100] train_loss: 0.5388, val_loss: 0.4555, val_auc: 0.8710, lr: 0.000092, 1.9s 2023-06-12 15:00:49 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [12/100] train_loss: 0.5287, val_loss: 0.4537, val_auc: 0.8701, lr: 0.000091, 1.9s 2023-06-12 15:00:51 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [13/100] train_loss: 0.4653, val_loss: 0.4967, val_auc: 0.8790, lr: 0.000090, 1.9s 2023-06-12 15:00:54 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [14/100] train_loss: 0.4685, val_loss: 0.4524, val_auc: 0.8877, lr: 0.000089, 1.9s 2023-06-12 15:00:56 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [15/100] train_loss: 0.4453, val_loss: 0.4268, val_auc: 0.8889, lr: 0.000088, 1.9s 2023-06-12 15:00:58 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [16/100] train_loss: 0.4392, val_loss: 0.5215, val_auc: 0.8835, lr: 0.000087, 1.9s 2023-06-12 15:01:00 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [17/100] train_loss: 0.4216, val_loss: 0.4522, val_auc: 0.8864, lr: 0.000086, 1.9s 2023-06-12 15:01:02 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [18/100] train_loss: 0.4232, val_loss: 0.4503, val_auc: 0.9000, lr: 0.000085, 1.9s 2023-06-12 15:01:04 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [19/100] train_loss: 0.4433, val_loss: 0.6177, val_auc: 0.8857, lr: 0.000084, 1.9s 2023-06-12 15:01:06 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [20/100] train_loss: 0.3962, val_loss: 0.4760, val_auc: 0.8852, lr: 0.000082, 1.9s 2023-06-12 15:01:08 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [21/100] train_loss: 0.3876, val_loss: 0.4913, val_auc: 0.8828, lr: 0.000081, 1.9s 2023-06-12 15:01:10 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [22/100] train_loss: 0.3761, val_loss: 0.5034, val_auc: 0.8806, lr: 0.000080, 1.9s 2023-06-12 15:01:12 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [23/100] train_loss: 0.3871, val_loss: 0.5234, val_auc: 0.8775, lr: 0.000079, 1.9s 2023-06-12 15:01:14 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [24/100] train_loss: 0.4044, val_loss: 0.5627, val_auc: 0.8928, lr: 0.000078, 1.9s 2023-06-12 15:01:16 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [25/100] train_loss: 0.3719, val_loss: 0.5097, val_auc: 0.8890, lr: 0.000077, 1.9s 2023-06-12 15:01:18 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [26/100] train_loss: 0.3687, val_loss: 0.5785, val_auc: 0.8927, lr: 0.000076, 1.9s 2023-06-12 15:01:19 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [27/100] train_loss: 0.3335, val_loss: 0.5258, val_auc: 0.8925, lr: 0.000075, 1.9s 2023-06-12 15:01:21 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [28/100] train_loss: 0.3340, val_loss: 0.5554, val_auc: 0.8901, lr: 0.000074, 1.9s 2023-06-12 15:01:21 | unimol/utils/metrics.py | 270 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 28 2023-06-12 15:01:21 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success! 2023-06-12 15:01:22 | unimol/models/nnmodel.py | 123 | INFO | Uni-Mol(QSAR) | fold 1, result {'auc': 0.8999722530521643, 'auroc': 0.8999722530521643, 'auprc': 0.8489596041035117, 'log_loss': 0.45574855122499724, 'acc': 0.8264462809917356, 'f1_score': 0.8037383177570094, 'mcc': 0.6482980887822377, 'precision': 0.7962962962962963, 'recall': 0.8113207547169812, 'cohen_kappa': 0.6482071161567216, 'f1_bst': 0.8037383177570094, 'acc_bst': 0.8264462809917356} 2023-06-12 15:01:22 | unimol/models/unimol.py | 107 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2023-06-12 15:01:24 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [1/100] train_loss: 0.7062, val_loss: 0.7216, val_auc: 0.6380, lr: 0.000033, 1.9s 2023-06-12 15:01:27 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [2/100] train_loss: 0.6766, val_loss: 0.6399, val_auc: 0.7406, lr: 0.000067, 1.9s 2023-06-12 15:01:29 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [3/100] train_loss: 0.6441, val_loss: 0.6279, val_auc: 0.7628, lr: 0.000100, 2.0s 2023-06-12 15:01:32 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [4/100] train_loss: 0.6494, val_loss: 0.8748, val_auc: 0.7887, lr: 0.000099, 1.9s 2023-06-12 15:01:34 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [5/100] train_loss: 0.6617, val_loss: 0.5479, val_auc: 0.7900, lr: 0.000098, 1.9s 2023-06-12 15:01:36 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [6/100] train_loss: 0.6092, val_loss: 0.5122, val_auc: 0.8215, lr: 0.000097, 1.9s 2023-06-12 15:01:39 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [7/100] train_loss: 0.6002, val_loss: 0.5951, val_auc: 0.8229, lr: 0.000096, 1.9s 2023-06-12 15:01:41 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [8/100] train_loss: 0.5757, val_loss: 0.5143, val_auc: 0.8389, lr: 0.000095, 1.9s 2023-06-12 15:01:43 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [9/100] train_loss: 0.5159, val_loss: 0.5368, val_auc: 0.8494, lr: 0.000094, 1.9s 2023-06-12 15:01:46 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [10/100] train_loss: 0.4947, val_loss: 0.4801, val_auc: 0.8621, lr: 0.000093, 1.9s 2023-06-12 15:01:48 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [11/100] train_loss: 0.4722, val_loss: 0.5094, val_auc: 0.8742, lr: 0.000092, 1.9s 2023-06-12 15:01:50 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [12/100] train_loss: 0.4481, val_loss: 0.4703, val_auc: 0.8857, lr: 0.000091, 1.9s 2023-06-12 15:01:53 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [13/100] train_loss: 0.5248, val_loss: 0.5220, val_auc: 0.8792, lr: 0.000090, 1.9s 2023-06-12 15:01:55 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [14/100] train_loss: 0.5055, val_loss: 0.7263, val_auc: 0.8697, lr: 0.000089, 1.9s 2023-06-12 15:01:57 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [15/100] train_loss: 0.5307, val_loss: 0.4491, val_auc: 0.8810, lr: 0.000088, 1.9s 2023-06-12 15:01:58 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [16/100] train_loss: 0.4658, val_loss: 0.4881, val_auc: 0.8801, lr: 0.000087, 1.9s 2023-06-12 15:02:00 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [17/100] train_loss: 0.4440, val_loss: 0.5163, val_auc: 0.8688, lr: 0.000086, 1.9s 2023-06-12 15:02:02 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [18/100] train_loss: 0.4456, val_loss: 0.4665, val_auc: 0.8921, lr: 0.000085, 1.9s 2023-06-12 15:02:05 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [19/100] train_loss: 0.4145, val_loss: 0.5648, val_auc: 0.8652, lr: 0.000084, 2.0s 2023-06-12 15:02:07 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [20/100] train_loss: 0.4040, val_loss: 0.5029, val_auc: 0.8944, lr: 0.000082, 1.9s 2023-06-12 15:02:09 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [21/100] train_loss: 0.3997, val_loss: 0.5478, val_auc: 0.8830, lr: 0.000081, 1.9s 2023-06-12 15:02:11 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [22/100] train_loss: 0.3973, val_loss: 0.5598, val_auc: 0.8785, lr: 0.000080, 1.9s 2023-06-12 15:02:13 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [23/100] train_loss: 0.3636, val_loss: 0.6002, val_auc: 0.8889, lr: 0.000079, 1.9s 2023-06-12 15:02:15 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [24/100] train_loss: 0.3930, val_loss: 0.5806, val_auc: 0.8826, lr: 0.000078, 1.9s 2023-06-12 15:02:17 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [25/100] train_loss: 0.3818, val_loss: 0.5400, val_auc: 0.8934, lr: 0.000077, 1.9s 2023-06-12 15:02:19 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [26/100] train_loss: 0.3972, val_loss: 0.4734, val_auc: 0.9076, lr: 0.000076, 1.9s 2023-06-12 15:02:21 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [27/100] train_loss: 0.3776, val_loss: 0.4733, val_auc: 0.9053, lr: 0.000075, 2.0s 2023-06-12 15:02:23 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [28/100] train_loss: 0.3567, val_loss: 0.5220, val_auc: 0.9021, lr: 0.000074, 1.9s 2023-06-12 15:02:25 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [29/100] train_loss: 0.3967, val_loss: 0.6430, val_auc: 0.8878, lr: 0.000073, 1.9s 2023-06-12 15:02:27 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [30/100] train_loss: 0.4211, val_loss: 0.5314, val_auc: 0.9017, lr: 0.000072, 1.9s 2023-06-12 15:02:29 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [31/100] train_loss: 0.3544, val_loss: 0.5186, val_auc: 0.9085, lr: 0.000071, 1.9s 2023-06-12 15:02:31 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [32/100] train_loss: 0.3446, val_loss: 0.5862, val_auc: 0.8929, lr: 0.000070, 1.9s 2023-06-12 15:02:33 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [33/100] train_loss: 0.4166, val_loss: 0.5491, val_auc: 0.8993, lr: 0.000069, 1.9s 2023-06-12 15:02:35 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [34/100] train_loss: 0.3339, val_loss: 0.5379, val_auc: 0.8988, lr: 0.000068, 1.9s 2023-06-12 15:02:37 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [35/100] train_loss: 0.3302, val_loss: 0.7157, val_auc: 0.9025, lr: 0.000067, 1.9s 2023-06-12 15:02:39 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [36/100] train_loss: 0.3182, val_loss: 0.6304, val_auc: 0.8965, lr: 0.000066, 1.9s 2023-06-12 15:02:41 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [37/100] train_loss: 0.3140, val_loss: 0.4813, val_auc: 0.9064, lr: 0.000065, 1.9s 2023-06-12 15:02:42 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [38/100] train_loss: 0.3360, val_loss: 0.6163, val_auc: 0.8988, lr: 0.000064, 2.0s 2023-06-12 15:02:44 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [39/100] train_loss: 0.2776, val_loss: 0.6821, val_auc: 0.9035, lr: 0.000063, 1.9s 2023-06-12 15:02:46 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [40/100] train_loss: 0.3242, val_loss: 0.8451, val_auc: 0.8978, lr: 0.000062, 1.9s 2023-06-12 15:02:48 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [41/100] train_loss: 0.2976, val_loss: 0.6851, val_auc: 0.9108, lr: 0.000061, 1.9s 2023-06-12 15:02:51 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [42/100] train_loss: 0.2789, val_loss: 0.6248, val_auc: 0.9082, lr: 0.000060, 1.9s 2023-06-12 15:02:53 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [43/100] train_loss: 0.2914, val_loss: 0.8050, val_auc: 0.8758, lr: 0.000059, 1.9s 2023-06-12 15:02:54 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [44/100] train_loss: 0.3001, val_loss: 0.7139, val_auc: 0.8971, lr: 0.000058, 1.9s 2023-06-12 15:02:56 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [45/100] train_loss: 0.2725, val_loss: 0.8069, val_auc: 0.8980, lr: 0.000057, 1.9s 2023-06-12 15:02:58 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [46/100] train_loss: 0.3146, val_loss: 0.9082, val_auc: 0.8881, lr: 0.000056, 1.9s 2023-06-12 15:03:00 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [47/100] train_loss: 0.2710, val_loss: 0.8542, val_auc: 0.8801, lr: 0.000055, 1.9s 2023-06-12 15:03:02 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [48/100] train_loss: 0.2740, val_loss: 0.6918, val_auc: 0.8858, lr: 0.000054, 1.9s 2023-06-12 15:03:04 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [49/100] train_loss: 0.2381, val_loss: 0.8474, val_auc: 0.8756, lr: 0.000053, 1.9s 2023-06-12 15:03:06 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [50/100] train_loss: 0.2274, val_loss: 0.8547, val_auc: 0.8979, lr: 0.000052, 1.9s 2023-06-12 15:03:08 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [51/100] train_loss: 0.2644, val_loss: 0.9796, val_auc: 0.8911, lr: 0.000051, 1.9s 2023-06-12 15:03:08 | unimol/utils/metrics.py | 270 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 51 2023-06-12 15:03:08 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success! 2023-06-12 15:03:08 | unimol/models/nnmodel.py | 123 | INFO | Uni-Mol(QSAR) | fold 2, result {'auc': 0.9107935627081021, 'auroc': 0.9107935627081021, 'auprc': 0.8387708802639177, 'log_loss': 0.6957129506578992, 'acc': 0.8305785123966942, 'f1_score': 0.8255319148936171, 'mcc': 0.676035670849555, 'precision': 0.751937984496124, 'recall': 0.9150943396226415, 'cohen_kappa': 0.6639116591016869, 'f1_bst': 0.8255319148936171, 'acc_bst': 0.8305785123966942} 2023-06-12 15:03:09 | unimol/models/unimol.py | 107 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2023-06-12 15:03:11 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [1/100] train_loss: 0.7000, val_loss: 0.6772, val_auc: 0.7784, lr: 0.000033, 1.9s 2023-06-12 15:03:13 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [2/100] train_loss: 0.6924, val_loss: 0.6510, val_auc: 0.7463, lr: 0.000067, 1.9s 2023-06-12 15:03:15 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [3/100] train_loss: 0.6922, val_loss: 0.8150, val_auc: 0.7965, lr: 0.000100, 1.9s 2023-06-12 15:03:18 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [4/100] train_loss: 0.6784, val_loss: 0.6150, val_auc: 0.8055, lr: 0.000099, 1.9s 2023-06-12 15:03:20 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [5/100] train_loss: 0.7005, val_loss: 0.6191, val_auc: 0.8101, lr: 0.000098, 1.9s 2023-06-12 15:03:22 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [6/100] train_loss: 0.6547, val_loss: 0.5618, val_auc: 0.8335, lr: 0.000097, 1.9s 2023-06-12 15:03:25 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [7/100] train_loss: 0.5813, val_loss: 0.5119, val_auc: 0.8490, lr: 0.000096, 1.9s 2023-06-12 15:03:27 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [8/100] train_loss: 0.5688, val_loss: 0.5325, val_auc: 0.8455, lr: 0.000095, 2.0s 2023-06-12 15:03:29 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [9/100] train_loss: 0.5408, val_loss: 0.4517, val_auc: 0.8628, lr: 0.000094, 1.9s 2023-06-12 15:03:31 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [10/100] train_loss: 0.5628, val_loss: 0.4236, val_auc: 0.8793, lr: 0.000093, 1.9s 2023-06-12 15:03:34 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [11/100] train_loss: 0.5199, val_loss: 0.4823, val_auc: 0.8684, lr: 0.000092, 1.9s 2023-06-12 15:03:35 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [12/100] train_loss: 0.4936, val_loss: 0.4763, val_auc: 0.8736, lr: 0.000091, 1.9s 2023-06-12 15:03:37 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [13/100] train_loss: 0.4872, val_loss: 0.6061, val_auc: 0.8833, lr: 0.000090, 1.9s 2023-06-12 15:03:40 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [14/100] train_loss: 0.5311, val_loss: 0.5340, val_auc: 0.8597, lr: 0.000089, 2.0s 2023-06-12 15:03:42 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [15/100] train_loss: 0.4856, val_loss: 0.5720, val_auc: 0.8536, lr: 0.000088, 1.9s 2023-06-12 15:03:44 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [16/100] train_loss: 0.4864, val_loss: 0.5353, val_auc: 0.8725, lr: 0.000087, 2.0s 2023-06-12 15:03:46 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [17/100] train_loss: 0.4687, val_loss: 0.5024, val_auc: 0.8842, lr: 0.000086, 2.0s 2023-06-12 15:03:48 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [18/100] train_loss: 0.4314, val_loss: 0.5423, val_auc: 0.8737, lr: 0.000085, 1.9s 2023-06-12 15:03:50 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [19/100] train_loss: 0.4419, val_loss: 0.4978, val_auc: 0.8843, lr: 0.000084, 2.0s 2023-06-12 15:03:52 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [20/100] train_loss: 0.3769, val_loss: 0.4936, val_auc: 0.8865, lr: 0.000082, 1.9s 2023-06-12 15:03:55 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [21/100] train_loss: 0.3875, val_loss: 0.5564, val_auc: 0.8814, lr: 0.000081, 1.9s 2023-06-12 15:03:57 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [22/100] train_loss: 0.3746, val_loss: 0.6387, val_auc: 0.8843, lr: 0.000080, 1.9s 2023-06-12 15:03:58 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [23/100] train_loss: 0.3776, val_loss: 0.5894, val_auc: 0.8694, lr: 0.000079, 1.9s 2023-06-12 15:04:00 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [24/100] train_loss: 0.4095, val_loss: 0.5246, val_auc: 0.8796, lr: 0.000078, 2.0s 2023-06-12 15:04:02 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [25/100] train_loss: 0.3767, val_loss: 0.6057, val_auc: 0.8711, lr: 0.000077, 2.0s 2023-06-12 15:04:04 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [26/100] train_loss: 0.3716, val_loss: 0.5677, val_auc: 0.8731, lr: 0.000076, 1.9s 2023-06-12 15:04:06 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [27/100] train_loss: 0.3712, val_loss: 0.6539, val_auc: 0.8539, lr: 0.000075, 2.0s 2023-06-12 15:04:08 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [28/100] train_loss: 0.3664, val_loss: 0.6351, val_auc: 0.8550, lr: 0.000074, 1.9s 2023-06-12 15:04:10 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [29/100] train_loss: 0.4016, val_loss: 0.6773, val_auc: 0.8524, lr: 0.000073, 2.0s 2023-06-12 15:04:12 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [30/100] train_loss: 0.3692, val_loss: 0.5979, val_auc: 0.8725, lr: 0.000072, 1.9s 2023-06-12 15:04:12 | unimol/utils/metrics.py | 270 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 30 2023-06-12 15:04:12 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success! 2023-06-12 15:04:12 | unimol/models/nnmodel.py | 123 | INFO | Uni-Mol(QSAR) | fold 3, result {'auc': 0.8864868721461187, 'auroc': 0.8864868721461187, 'auprc': 0.8247679586569595, 'log_loss': 0.49170930682843134, 'acc': 0.8099173553719008, 'f1_score': 0.77, 'mcc': 0.6098848569005575, 'precision': 0.7403846153846154, 'recall': 0.8020833333333334, 'cohen_kappa': 0.6084693303320203, 'f1_bst': 0.77, 'acc_bst': 0.8099173553719008} 2023-06-12 15:04:13 | unimol/models/unimol.py | 107 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2023-06-12 15:04:15 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [1/100] train_loss: 0.6895, val_loss: 0.6836, val_auc: 0.7137, lr: 0.000033, 2.0s 2023-06-12 15:04:17 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [2/100] train_loss: 0.6825, val_loss: 0.6159, val_auc: 0.7670, lr: 0.000067, 1.9s 2023-06-12 15:04:20 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [3/100] train_loss: 0.6651, val_loss: 0.6295, val_auc: 0.8111, lr: 0.000100, 1.9s 2023-06-12 15:04:22 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [4/100] train_loss: 0.6799, val_loss: 0.5340, val_auc: 0.8727, lr: 0.000099, 1.9s 2023-06-12 15:04:25 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [5/100] train_loss: 0.6522, val_loss: 0.5009, val_auc: 0.8665, lr: 0.000098, 2.0s 2023-06-12 15:04:26 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [6/100] train_loss: 0.6008, val_loss: 0.4515, val_auc: 0.8943, lr: 0.000097, 1.9s 2023-06-12 15:04:29 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [7/100] train_loss: 0.5415, val_loss: 0.3851, val_auc: 0.9023, lr: 0.000096, 1.9s 2023-06-12 15:04:31 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [8/100] train_loss: 0.5484, val_loss: 0.3656, val_auc: 0.9119, lr: 0.000095, 1.9s 2023-06-12 15:04:33 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [9/100] train_loss: 0.5258, val_loss: 0.4268, val_auc: 0.9135, lr: 0.000094, 1.9s 2023-06-12 15:04:36 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [10/100] train_loss: 0.5220, val_loss: 0.3631, val_auc: 0.9149, lr: 0.000093, 1.9s 2023-06-12 15:04:38 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [11/100] train_loss: 0.5316, val_loss: 0.3722, val_auc: 0.9156, lr: 0.000092, 1.9s 2023-06-12 15:04:40 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [12/100] train_loss: 0.4923, val_loss: 0.2957, val_auc: 0.9465, lr: 0.000091, 1.9s 2023-06-12 15:04:43 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [13/100] train_loss: 0.4781, val_loss: 0.5966, val_auc: 0.9242, lr: 0.000090, 2.0s 2023-06-12 15:04:45 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [14/100] train_loss: 0.5368, val_loss: 0.3442, val_auc: 0.9344, lr: 0.000089, 1.9s 2023-06-12 15:04:47 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [15/100] train_loss: 0.5019, val_loss: 0.3224, val_auc: 0.9276, lr: 0.000088, 1.9s 2023-06-12 15:04:49 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [16/100] train_loss: 0.4800, val_loss: 0.3078, val_auc: 0.9371, lr: 0.000087, 1.9s 2023-06-12 15:04:51 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [17/100] train_loss: 0.4542, val_loss: 0.3903, val_auc: 0.9214, lr: 0.000086, 1.9s 2023-06-12 15:04:52 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [18/100] train_loss: 0.4114, val_loss: 0.3489, val_auc: 0.9276, lr: 0.000085, 1.9s 2023-06-12 15:04:54 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [19/100] train_loss: 0.4273, val_loss: 0.4122, val_auc: 0.9294, lr: 0.000084, 1.9s 2023-06-12 15:04:56 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [20/100] train_loss: 0.3946, val_loss: 0.3979, val_auc: 0.9148, lr: 0.000082, 1.9s 2023-06-12 15:04:58 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [21/100] train_loss: 0.3945, val_loss: 0.3545, val_auc: 0.9304, lr: 0.000081, 1.9s 2023-06-12 15:05:00 | unimol/tasks/trainer.py | 156 | INFO | Uni-Mol(QSAR) | Epoch [22/100] train_loss: 0.4037, val_loss: 0.5166, val_auc: 0.9230, lr: 0.000080, 1.9s 2023-06-12 15:05:00 | unimol/utils/metrics.py | 270 | WARNING | Uni-Mol(QSAR) | Early stopping at epoch: 22 2023-06-12 15:05:00 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success! 2023-06-12 15:05:01 | unimol/models/nnmodel.py | 123 | INFO | Uni-Mol(QSAR) | fold 4, result {'auc': 0.9465415363225582, 'auroc': 0.9465415363225582, 'auprc': 0.9283302239610741, 'log_loss': 0.3047249227462044, 'acc': 0.859504132231405, 'f1_score': 0.8411214953271028, 'mcc': 0.7156760410088872, 'precision': 0.8256880733944955, 'recall': 0.8571428571428571, 'cohen_kappa': 0.7152744134542183, 'f1_bst': 0.8411214953271028, 'acc_bst': 0.859504132231405} 2023-06-12 15:05:01 | unimol/models/nnmodel.py | 135 | INFO | Uni-Mol(QSAR) | Uni-Mol metrics score: {'auc': 0.8945449465670182, 'auroc': 0.8945449465670182, 'auprc': 0.8168464936643415, 'log_loss': 0.48031494701319677, 'acc': 0.8338842975206612, 'f1_score': 0.815426997245179, 'mcc': 0.668452163193527, 'precision': 0.7735191637630662, 'recall': 0.8621359223300971, 'cohen_kappa': 0.6652167329690146, 'f1_bst': 0.815426997245179, 'acc_bst': 0.8338842975206612} 2023-06-12 15:05:01 | unimol/models/nnmodel.py | 136 | INFO | Uni-Mol(QSAR) | Uni-Mol & Metric result saved! 2023-06-12 15:05:01 | unimol/utils/metrics.py | 288 | INFO | Uni-Mol(QSAR) | metrics for threshold: accuracy_score 2023-06-12 15:05:01 | unimol/utils/metrics.py | 301 | INFO | Uni-Mol(QSAR) | best threshold: 0.31653441677458194, metrics: 0.8380165289256198
2023-06-12 15:05:01 | unimol/data/conformer.py | 56 | INFO | Uni-Mol(QSAR) | Start generating conformers... 303it [00:12, 25.08it/s] 2023-06-12 15:05:13 | unimol/data/conformer.py | 60 | INFO | Uni-Mol(QSAR) | Failed to generate conformers for 0.00% of molecules. 2023-06-12 15:05:13 | unimol/data/conformer.py | 62 | INFO | Uni-Mol(QSAR) | Failed to generate 3d conformers for 0.00% of molecules. 2023-06-12 15:05:14 | unimol/models/unimol.py | 107 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt 2023-06-12 15:05:14 | unimol/models/nnmodel.py | 145 | INFO | Uni-Mol(QSAR) | start predict NNModel:unimolv1 2023-06-12 15:05:14 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success! 2023-06-12 15:05:15 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success! 2023-06-12 15:05:16 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success! 2023-06-12 15:05:17 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success! 2023-06-12 15:05:17 | unimol/tasks/trainer.py | 197 | INFO | Uni-Mol(QSAR) | load model success! 2023-06-12 15:05:18 | unimol/predict.py | 66 | INFO | Uni-Mol(QSAR) | final predict metrics score: {'auc': 0.7684770937723694, 'auroc': 0.7684770937723694, 'auprc': 0.7971093056207901, 'log_loss': 0.7244570761943611, 'acc': 0.6732673267326733, 'f1_score': 0.6990881458966567, 'mcc': 0.34955144808478406, 'precision': 0.7516339869281046, 'recall': 0.6534090909090909, 'cohen_kappa': 0.3454866793218564, 'f1_bst': 0.6990881458966567, 'acc_bst': 0.6732673267326733}
[Uni-Mol] ACC:0.6733 AUC:0.7685
结果总览
最后,我们可以横向比较一下1D-QSAR, 2D-QSAR, 3D-QSAR和不同的模型组合,以及Uni-Mol在同一个数据集上的预测表现。
ACC | AUC | |
---|---|---|
Uni-Mol | 0.673267 | 0.768477 |
2D-QSAR-Random Forest | 0.660066 | 0.765994 |
2D-QSAR-Logistic Regression | 0.679868 | 0.758187 |
2D-QSAR-XGBoost | 0.673267 | 0.75293 |
2D-QSAR-Stochastic Gradient Descent | 0.686469 | 0.742439 |
2D-QSAR-K-Nearest Neighbors | 0.716172 | 0.741902 |
3D-QSAR-Random Forest | 0.660066 | 0.739531 |
2D-QSAR-Bernoulli Naive Bayes | 0.656766 | 0.73291 |
3D-QSAR-XGBoost | 0.643564 | 0.729018 |
2D-QSAR-Multi-layer Perceptron | 0.660066 | 0.713762 |
3D-QSAR-K-Nearest Neighbors | 0.60396 | 0.70081 |
1D-QSAR-XGBoost | 0.650165 | 0.683227 |
1D-QSAR-Random Forest | 0.617162 | 0.679022 |
3D-QSAR-Bernoulli Naive Bayes | 0.656766 | 0.665824 |
3D-QSAR-Multi-layer Perceptron | 0.630363 | 0.65238 |
1D-QSAR-Multi-layer Perceptron | 0.570957 | 0.648756 |
1D-QSAR-Logistic Regression | 0.564356 | 0.639607 |
2D-QSAR-Decision Tree | 0.613861 | 0.636901 |
1D-QSAR-K-Nearest Neighbors | 0.60396 | 0.628691 |
1D-QSAR-Decision Tree | 0.584158 | 0.59639 |
3D-QSAR-Decision Tree | 0.59736 | 0.590551 |
1D-QSAR-Stochastic Gradient Descent | 0.419142 | 0.502841 |
3D-QSAR-Logistic Regression | 0.419142 | 0.5 |
1D-QSAR-Bernoulli Naive Bayes | 0.409241 | 0.457073 |
3D-QSAR-Stochastic Gradient Descent | 0.376238 | 0.37849 |
dengb@dp.tech
zhengh@dp.tech