Bohrium
robot
新建

空间站广场

论文
Notebooks
比赛
课程
Apps
我的主页
我的Notebooks
我的论文库
我的足迹

我的工作空间

任务
节点
文件
数据集
镜像
项目
数据库
公开
Molecular Vector Representation Based on Uni-Mol
Uni-Mol
Deep Learning
Uni-MolDeep Learning
Yani Guan
更新于 2024-10-24
推荐镜像 :Uni-Mol:unimol-qsar:v0.5
推荐机型 :c3_m4_1 * NVIDIA T4
Molecular Vector Representation Based on Uni-Mol
What is Molecular/Atomic Vector Representation
import Uni-Mol

Molecular Vector Representation Based on Uni-Mol

What is Molecular/Atomic Vector Representation

  • Molecular/atomic level vector representation is the process of representing the chemical and physical properties of atoms or molecules as mathematical vectors. This is significant in cheminformatics, computational chemistry, drug design, and materials science.

  • Atomic level vector representation includes information such as atom type, electronegativity, atomic radius, electron shell structure, charge distribution, and bond type. For example, a carbon atom's vector might be [6, 2.55, 0.77, [2, 4], 0.0, 1].

  • Molecular level vector representation covers molecular structure information, electronic properties, geometric structure, spectral properties, thermodynamic properties, and reactivity. For example, a water molecule's vector might be [18.015, 1.85, [0.957, 104.5], [0.34, 0.17, 1.85], -75.0].

This representation method is widely used in machine learning models, molecular similarity searches, and chemical reaction predictions. It provides an important foundation for quantitative analysis and computation, promoting research and applications in chemistry and biology.

代码
文本

import Uni-Mol

代码
文本
[1]
# Select the image: unimol-qsar:unimollastest, choose the GPU machine type
# Import unimol
from unimol import UniMolRepr
import numpy as np
import pandas as pd
/opt/conda/lib/python3.8/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
代码
文本
[2]
import numpy as np
from unimol_tools import UniMolRepr
# single smiles unimol representation
clf = UniMolRepr(data_type='molecule', remove_hs=False)
smiles = 'c1ccc(cc1)C2=NCC(=O)Nc3c2cc(cc3)[N+](=O)[O]'
smiles_list = [smiles]
unimol_repr = clf.get_repr(smiles_list, return_atomic_reprs=True)
# CLS token repr
print(np.array(unimol_repr['cls_repr']).shape)
# atomic level repr, align with rdkit mol.GetAtoms()
print(np.array(unimol_repr['atomic_reprs']).shape)
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Input In [2], in <cell line: 2>()
      1 import numpy as np
----> 2 from unimol_tools import UniMolRepr
      3 # single smiles unimol representation
      4 clf = UniMolRepr(data_type='molecule', remove_hs=False)

ModuleNotFoundError: No module named 'unimol_tools'
代码
文本
[ ]
# Single SMILES UniMol representation
clf = UniMolRepr(data_type='molecule', remove_hs=False)
smiles = 'c1ccc(cc1)C2=NCC(=O)Nc3c2cc(cc3)[N+](=O)[O]'
smiles_list = [smiles]
unimol_repr = clf.get_repr(smiles_list, return_atomic_reprs=True)

# Uni-Mol molecular representation using the cls token
print(np.array(unimol_repr['cls_repr']).shape)

# Uni-Mol atomic-level representation, consistent with the atom order in rdkit Mol
print(np.array(unimol_repr['atomic_reprs']).shape)
代码
文本
[ ]
%%bash
# download sample dataset, CNS drug data
rm -rf mol_train.csv
wget -nv https://bohrium-example.oss-cn-zhangjiakou.aliyuncs.com/unimol-qsar/mol_train.csv
代码
文本
[ ]
smiles_list = pd.read_csv('mol_train.csv')['SMILES'].to_list()
y = pd.read_csv('mol_train.csv')['TARGET'].to_list()
repr_dict = clf.get_repr(smiles_list)
unimol_repr_list = np.array(repr_dict['cls_repr'])
代码
文本
[ ]
print(unimol_repr_list.shape)
代码
文本
[ ]
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn.decomposition import PCA
代码
文本
[ ]
pca = PCA(n_components=2)
X_reduced = pca.fit_transform(unimol_repr_list)
代码
文本
[ ]
# visualization
colors = ['r', 'g', 'b']
markers = ['s', 'o', 'D']
labels = ['Target:0','Target:1']

plt.figure(figsize=(8, 6))

for label, color, marker in zip(np.unique(y), colors, markers):
plt.scatter(X_reduced[y == label, 0],
X_reduced[y == label, 1],
c=color,
marker=marker,
label=labels[label],
edgecolors='black')

plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend(loc='best')
plt.title('Unimol Repr')
plt.show()
代码
文本
[ ]

代码
文本
Uni-Mol
Deep Learning
Uni-MolDeep Learning
点个赞吧
推荐阅读
公开
基于Uni-Mol的分子&原子级别向量表征
Uni-MolDeep Learning
Uni-MolDeep Learning
Zhifeng Gao
更新于 2024-07-24
4 赞8 转存文件
公开
Molecular Property Prediction Based on Uni-Mol
Uni-MolDeep LearningAI4SCUP-OLED
Uni-MolDeep LearningAI4SCUP-OLED
Yani Guan
更新于 2024-10-17
{/**/}