Bohrium
robot
新建

空间站广场

论文
Notebooks
比赛
课程
Apps
我的主页
我的Notebooks
我的论文库
我的足迹

我的工作空间

任务
节点
文件
数据集
镜像
项目
数据库
公开
Uni-Mol
Uni-Mol
notebook
Tutorial
中文
Machine Learning
QSAR
Uni-MolnotebookTutorial中文Machine LearningQSAR
OvO
更新于 2024-08-26
推荐镜像 :unimol-qsar:v0.2
推荐机型 :c12_m92_1 * NVIDIA V100
1
1
基于Uni-Mol的分子向量表征

基于Uni-Mol的分子向量表征

©️ Copyright 2023 @ Authors
作者: 高志锋 📨
日期:2023-06-06
共享协议:本作品采用知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议进行许可。
快速开始:点击上方的 开始连接 按钮,选择 unimol-qsar:v0.2镜像及任意GPU节点配置,稍等片刻即可运行。

代码
文本
[1]
# 导入unimol
from unimol import UniMolRepr
import numpy as np
import pandas as pd
/opt/conda/lib/python3.8/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
代码
文本
[2]
# single smiles unimol representation
clf = UniMolRepr(data_type='molecule')
df = pd.read_csv('/personal/compounddata.csv')
error_ids = []
# for index, row in df.iterrows():
for index, row in df.iloc[26555:].iterrows():
compound_id = row['COMPOUND_ID']
compound_smiles = row['COMPOUND_SMILES']
# smiles = 'CN1CCC(C(C1)O)C2=C(C=C(C3=C2OC(=CC3=O)C4=CC=CC=C4Cl)O)O'
# id = '5287969'
smiles_list = [compound_smiles]
try:
unimol_repr = np.array(clf.get_repr(smiles_list)["cls_repr"])
print(unimol_repr.shape)
print(str(compound_id))
np.save('/personal/bindingdb3/'+ str(compound_id) + '.npy', unimol_repr)
except Exception as e:
print(f"Error processing compound id {compound_id}: {e}")
# Add the error id into the list
error_ids.append(compound_id)

# Convert the list into a DataFrame and save it into a csv file
error_df = pd.DataFrame(error_ids, columns=['Error_ID'])
error_df.to_csv('/personal/failed.csv', index=False)
2024-07-26 09:28:52 | unimol/models/unimol.py | 116 | INFO | Uni-Mol(QSAR) | Loading pretrained weights from /opt/conda/lib/python3.8/site-packages/unimol-0.0.2-py3.8.egg/unimol/weights/mol_pre_all_h_220816.pt
2024-07-26 09:28:56 | unimol/data/conformer.py | 62 | INFO | Uni-Mol(QSAR) | Start generating conformers...
0it [00:00, ?it/s]
代码
文本
[ ]

代码
文本
[3]
%%bash
# 下载样例数据, CNS drug data
rm -rf mol_train.csv
wget -nv https://bohrium-example.oss-cn-zhangjiakou.aliyuncs.com/unimol-qsar/mol_train.csv
2023-06-12 16:10:30 URL:https://bohrium-example.oss-cn-zhangjiakou.aliyuncs.com/unimol-qsar/mol_train.csv [30600/30600] -> "mol_train.csv" [1]
代码
文本
[4]
smiles_list = pd.read_csv('mol_train.csv')['SMILES'].to_list()
y = pd.read_csv('mol_train.csv')['TARGET'].to_list()
unimol_repr_list = np.array(clf.get_repr(smiles_list)["cls_repr"])
2023-06-12 16:10:31 | unimol/data/conformer.py | 56 | INFO | Uni-Mol(QSAR) | Start generating conformers...
700it [00:10, 66.34it/s]
2023-06-12 16:10:41 | unimol/data/conformer.py | 60 | INFO | Uni-Mol(QSAR) | Failed to generate conformers for 0.00% of molecules.
2023-06-12 16:10:41 | unimol/data/conformer.py | 62 | INFO | Uni-Mol(QSAR) | Failed to generate 3d conformers for 0.00% of molecules.
                                                    
代码
文本
[5]
print(unimol_repr_list.shape)
(700, 512)
代码
文本
[6]
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn.decomposition import PCA
代码
文本
[7]
pca = PCA(n_components=2)
X_reduced = pca.fit_transform(unimol_repr_list)
代码
文本
[8]
# 可视化
colors = ['r', 'g', 'b']
markers = ['s', 'o', 'D']
labels = ['Target:0','Target:1']

plt.figure(figsize=(8, 6))

for label, color, marker in zip(np.unique(y), colors, markers):
plt.scatter(X_reduced[y == label, 0],
X_reduced[y == label, 1],
c=color,
marker=marker,
label=labels[label],
edgecolors='black')

plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend(loc='best')
plt.title('Unimol Repr')
plt.show()
代码
文本
[ ]

代码
文本
Uni-Mol
notebook
Tutorial
中文
Machine Learning
QSAR
Uni-MolnotebookTutorial中文Machine LearningQSAR
点个赞吧
推荐阅读
公开
Uni-Mol分子向量表征
Uni-MolnotebookTutorial中文Machine LearningQSAR
Uni-MolnotebookTutorial中文Machine LearningQSAR
zhengh@dp.tech
发布于 2023-06-12
6 赞12 转存文件2 评论
公开
Uni-MOF:MOF材料吸附全相图预测工具
Machine LearningTutorial中文Uni-MolnotebookQSAR
Machine LearningTutorial中文Uni-MolnotebookQSAR
zhengh@dp.tech
更新于 2024-07-23
2 赞7 转存文件