Bohrium
robot
新建

空间站广场

论文
Notebooks
比赛
课程
Apps
我的主页
我的Notebooks
我的论文库
我的足迹

我的工作空间

任务
节点
文件
数据集
镜像
项目
数据库
公开
Uni-Mol Docking Demonstration (Using PoseBusters as an Example)
Uni-Mol
docking
Uni-Moldocking
Yani Guan
更新于 2024-10-24
推荐镜像 :unimol-docking:pytorch1.12.1-cuda11.6
推荐机型 :c3_m4_1 * NVIDIA T4
Background
About Uni-Mol
About PoseBusters
Preparation Before Running::
Environment
Code, Data, and Model
Running
Import Modules
Data Preprocessing Function for Generating lmdb Files
Generating lmdb Files for Model Input from Protein pdb Files and Ligand sdf Files
Inference Using Public Model Weights
Perform Docking Based on the Predicted Distance Matrix, Then Calculate the RMSD Metric:
Calculate Symmetric RMSD Metric
Prediction Structure Visualization

©️ All rights reserved 2023 @ Author
Author: Gengmo Zhou 📨
Date: 2024-7-24
Licenses: This Bohrium notebook uses Uni-Mol model parameters, and its output content follows the Creative Commons Attribution 4.0 International (CC BY 4.0) license. You can find detailed information at: http://creativecommons.org/licenses/by-nc-sa/4.0
Quick Start: Click the Start Connection button above, select the unimol-docking:pytorch1.12.1-cuda11.6 image and GPU machine to start using, the cheapest one will do.

代码
文本

alt img_v3_025c_0dbe5a36-1e6b-41f7-bc4a-2d60bd54282g.png

代码
文本

Background

代码
文本

About Uni-Mol

Uni-Mol is a universal 3D molecular representation learning framework based on molecular structures, released by DeepModeling in May 2022. Uni-Mol includes two pre-trained models, both adopting the same SE(3) Transformer architecture: one is a molecular model pre-trained with 209M molecular conformations; the other is a pocket model pre-trained with 3 million candidate protein pocket data.

Utilizing 3D structural information combined with an effective pre-training scheme enables Uni-Mol to surpass previous best methods in 14 out of 15 molecular property prediction tasks. Notably, Uni-Mol excels in 3D space-related tasks, including protein-ligand binding pose prediction, molecular conformation generation, etc. The paper has been accepted by the top machine learning conference ICLR 2023.

代码
文本

About PoseBusters

PoseBusters is a Python package that performs a series of standard quality checks using the well-known cheminformatics toolkit RDKit. Only those methods that pass these checks and predict binding modes similar to natural ones should be considered to have "state-of-the-art" performance.

The PoseBusters benchmark set is a new, carefully curated, publicly available set of crystal complexes from the PDB. It is a diverse, recent collection of high-quality protein-ligand complexes containing drug-like molecules. It only includes complexes released since 2021, thus excluding any complexes from the PDBbind general set v2020, which was used to train Uni-Mol.

代码
文本

Preparation Before Running::

Environment

  • Base Docker image:
dptechnology/unicore:latest-pytorch1.12.1-cuda11.6-rdma
  • Other dependencies: RDKit and BioPandas:
rdkit==2022.9.3
biopandas==0.4.1
  • Data: Protein PDF files and ligand SDF files from PoseBusters and Astex

Code, Data, and Model

  • Code link: https://github.com/deepmodeling/Uni-Mol

  • Commit: b962451 (b962451a019e15363bd34b3af9d3a3cd02330947)

  • Project path: /workspace/Uni-Mol

  • Data path: /workspace/Uni-Mol/eval_sets

  • Model path: /workspace/Uni-Mol/ckp/binding_pose_220908.pt (can be downloaded from the GitHub repository)

代码
文本

Running

Import Modules

代码
文本
[1]
import os
import pickle
import numpy as np
import pandas as pd
from rdkit import Chem, RDLogger
from rdkit.Chem import AllChem
from tqdm import tqdm
RDLogger.DisableLog('rdApp.*')
import warnings
warnings.filterwarnings(action='ignore')
from multiprocessing import Pool
import copy
import lmdb
from biopandas.pdb import PandasPdb
from sklearn.cluster import KMeans
from rdkit.Chem.rdMolAlign import AlignMolConformers
代码
文本

Data Preprocessing Function for Generating lmdb Files

Ligand Preparation

Extract molecules from SDF files and use RDKit to generate 100 conformations for each. Then, cluster these conformations into 10 groups using k-means, and use them as the initial input for the model.

Protein Preparation

Protein pocket residues are defined as residues within 6 Å of any ligand crystal structure's heavy atoms. Then, extract atoms from these residues and filter out metal and rare element atoms to obtain the pocket atoms for model input.

代码
文本
[2]
# allowed atom types
main_atoms = ['N', 'CA', 'C', 'O', 'H']
allow_pocket_atoms = ['C', 'H', 'N', 'O', 'S']

def cal_configs(coords):
"""Calculate pocket configs"""

centerx,centery,centerz = list((np.max(coords,axis=0)+np.min(coords,axis=0))/2)
sizex,sizey,sizez = list(np.max(coords,axis=0)-np.mean(coords,axis=0))
config = {'cx':centerx,'cy':centery,'cz':centerz,
'sx':sizex,'sy':sizey,'sz':sizez}
return config,centerx,centery,centerz,sizex,sizey,sizez


def filter_pocketatoms(atom):
if atom[:2] in ['Cd','Cs', 'Cn', 'Ce', 'Cm', 'Cf', 'Cl', 'Ca', \
'Cr', 'Co', 'Cu', 'Nh', 'Nd', 'Np', 'No', 'Ne', 'Na',\
'Ni','Nb', 'Os', 'Og', 'Hf', 'Hg', 'Hs', 'Ho', 'He',\
'Sr', 'Sn', 'Sb', 'Sg', 'Sm', 'Si', 'Sc', 'Se']:
return None
if atom[0] >= '0' and atom[0] <= '9':
return filter_pocketatoms(atom[1:])
if atom[0] in ['Z','M','P','D','F','K','I','B']:
return None
if atom[0] in allow_pocket_atoms:
return atom
return atom


def single_conf_gen(tgt_mol, num_confs=1000, seed=42, removeHs=True):
mol = copy.deepcopy(tgt_mol)
mol = Chem.AddHs(mol)
allconformers = AllChem.EmbedMultipleConfs(mol, numConfs=num_confs, randomSeed=seed, clearConfs=True)
sz = len(allconformers)
for i in range(sz):
try:
AllChem.MMFFOptimizeMolecule(mol, confId=i)
except:
continue
if removeHs:
mol = Chem.RemoveHs(mol)
return mol


def clustering_coords(mol, M=1000, N=100, seed=42, removeHs=True, method='bonds'):
rdkit_coords_list = []
if method == 'rdkit_MMFF':
rdkit_mol = single_conf_gen(mol, num_confs=M, seed=seed, removeHs=removeHs)
else:
print('no conformer generation methods:{}'.format(method))
raise
noHsIds = [rdkit_mol.GetAtoms()[i].GetIdx() for i in range(len(rdkit_mol.GetAtoms())) if rdkit_mol.GetAtoms()[i].GetAtomicNum()!=1]
# exclude hydrogens for aligning
AlignMolConformers(rdkit_mol, atomIds=noHsIds)
sz = len(rdkit_mol.GetConformers())
for i in range(sz):
_coords = rdkit_mol.GetConformers()[i].GetPositions().astype(np.float32)
rdkit_coords_list.append(_coords)
# cluster confs, select the nearest conf to the center
# (num_confs, num_atoms, 3)
rdkit_coords = np.array(rdkit_coords_list)[:, noHsIds]
# (num_confa, num_atoms, 3) -> (num_confs, num_atoms*3)
rdkit_coords_flatten = rdkit_coords.reshape(sz, -1)
kmeans = KMeans(n_clusters=N, random_state=seed).fit(rdkit_coords_flatten)
# (num_clusters, num_atoms, 3)
center_coords = kmeans.cluster_centers_.reshape(N, -1, 3)
# (num_cluster, num_confs)
cdist = ((center_coords[:, None] - rdkit_coords[None, :])**2).sum(axis=(-1, -2))
# (num_confs,)
argmin = np.argmin(cdist, axis=-1)
coords_list = [rdkit_coords_list[i] for i in argmin]
return coords_list


def extract_pose_posebuster(content):

pdbid, ligid, protein_path, ligand_path, index = content

def read_pdb(path, pdbid):
#### protein preparation
pfile = os.path.join(path, pdbid+'.pdb')
pmol = PandasPdb().read_pdb(pfile)
return pmol

### totally posebuster data
def read_mol(path, pdbid, ligid):
lsdf = os.path.join(path, f'{pdbid}_{ligid}.sdf')
supp = Chem.SDMolSupplier(lsdf)
mols = [mol for mol in supp if mol]
if len(mols) == 0:
print(lsdf)
mol = mols[0]
return mol

# influence pocket size
dist_thres=6
if pdbid == 'index' or pdbid == 'readme':
return None

pmol = read_pdb(protein_path, pdbid)
pname = pdbid
mol = read_mol(ligand_path, pdbid, ligid)
mol = Chem.RemoveHs(mol)
lcoords = mol.GetConformer().GetPositions().astype(np.float32)
pdf = pmol.df['ATOM']
filter_std = []
for lcoord in lcoords:
pdf['dist'] = pmol.distance(xyz=list(lcoord), records=('ATOM'))
df = pdf[(pdf.dist <= dist_thres) & (pdf.element_symbol != 'H')][['chain_id', 'residue_number']]
filter_std += list(zip(df.chain_id.tolist(), df.residue_number.tolist()))

filter_std = set(filter_std)
patoms, pcoords, residues = [], np.empty((0,3)), []
for id,res in filter_std:
df = pdf[(pdf.chain_id == id) & (pdf.residue_number == res)]
patoms += df['atom_name'].tolist()
pcoords = np.concatenate((pcoords, df[['x_coord','y_coord','z_coord']].to_numpy()), axis=0)
residues += [str(id)+str(res)]*len(df)

if len(pcoords)==0:
print('empty pocket:', pdbid)
return None
config,centerx,centery,centerz,sizex,sizey,sizez = cal_configs(pcoords)

# filter unnormal atoms, include metal
atoms, index, residues_tmp = [], [], []
for i,a in enumerate(patoms):
output = filter_pocketatoms(a)
if output is not None:
index.append(True)
atoms.append(output)
residues_tmp.append(residues[i])
else:
index.append(False)
coordinates = pcoords[index].astype(np.float32)
residues = residues_tmp

assert len(atoms) == len(residues)
assert len(atoms) == coordinates.shape[0]

if len(atoms) != coordinates.shape[0]:
print(pname)
return None
patoms = atoms
pcoords = [coordinates]
side = [0 if a in main_atoms else 1 for a in patoms]

smiles = Chem.MolToSmiles(mol)
mol = AllChem.AddHs(mol, addCoords=True)
latoms = [atom.GetSymbol() for atom in mol.GetAtoms()]
holo_coordinates = [mol.GetConformer().GetPositions().astype(np.float32)]
holo_mol = mol
M, N = 100, 10
coordinate_list = clustering_coords(mol, M=M, N=N, seed=42, removeHs=False, method='rdkit_MMFF')
mol_list = [mol]*N
ligand = [latoms, coordinate_list, holo_coordinates, smiles, mol_list, holo_mol]

return pname, patoms, pcoords, side, residues, config, ligand


def parser(content):
pname, patoms, pcoords, side, residues, config, ligand = extract_pose_posebuster(content)
latoms, coordinate_list, holo_coordinates, smiles, mol_list, holo_mol = ligand
pickle.dumps({})
return pickle.dumps(
{
"atoms": latoms,
"coordinates": coordinate_list,
"mol_list": mol_list,
"pocket_atoms": patoms,
"pocket_coordinates": pcoords,
"side": side,
"residue": residues,
"config": config,
"holo_coordinates": holo_coordinates,
"holo_mol": holo_mol,
"holo_pocket_coordinates": pcoords,
"smi": smiles,
'pocket':pname,
'scaffold':pname,
},
protocol=-1,
)


def write_lmdb(protein_path, ligand_path, outpath, meta_info_file, lmdb_name, num_ligand=428, nthreads=8):
os.makedirs(outpath, exist_ok=True)
df = pd.read_csv(meta_info_file)
print(f'Example of meta_info content: \n{df.head(1)}')
pdb_ids = list(df['pdb_code'].values)[:num_ligand]
lig_ids = list(df['lig_code'].values)[:num_ligand]
print(f'pdb code: {pdb_ids} \nlig code: {lig_ids}')
content_list = list(zip(pdb_ids, lig_ids, [protein_path]*len(pdb_ids), [ligand_path]*len(pdb_ids), range(len(pdb_ids))))
outputfilename = os.path.join(outpath, lmdb_name +'.lmdb')
try:
os.remove(outputfilename)
except:
pass
env_new = lmdb.open(
outputfilename,
subdir=False,
readonly=False,
lock=False,
readahead=False,
meminit=False,
max_readers=1,
map_size=int(100e9),
)
txn_write = env_new.begin(write=True)
print("Start preprocessing data...")
print(f'Number of systems: {len(pdb_ids)}')
with Pool(nthreads) as pool:
i = 0
failed_num = 0
for inner_output in tqdm(pool.imap(parser, content_list)):
if inner_output is not None:
txn_write.put(f"{i}".encode("ascii"), inner_output)
i+=1
elif inner_output is None:
failed_num += 1
txn_write.commit()
env_new.close()
print(f'\nTotal num: {len(pdb_ids)}, Success: {i}, Failed: {failed_num}')
print("Done!")
代码
文本

Generating lmdb Files for Model Input from Protein pdb Files and Ligand sdf Files

Data Description eval_sets

  • PoseBusters data (428 entries) and Astex data (85 entries) are stored in the posebusters and astex folders under eval_sets, respectively. The parse_protein.py script in the same directory is used to process the downloaded raw pdb and sdf files.

  • In the posebusters directory, the protein and ligand folders store the processed pdb and sdf files after running the parse_protein script. The naming format for pdb files is {pdb_code}.pdb, and for sdf files, it is {pdb_code}_{lig_code}.sdf. The posebuster_set_meta.csv file contains the pdb code, ligand code, and corresponding download URLs for each entry in the PoseBusters benchmark. The raw data is downloaded from PDB via these URLs.

  • The data directory structure for Astex is similar to that of PoseBusters.

Here, for demonstration purposes, the first two complexes are selected.

代码
文本
[3]
### workspace
project_path='/workspace/Uni-Mol'

# num of threads during preprocessing, the same as the num of CPUs.
nthreads = 12

### for posebusters
protein_path = f'{project_path}/eval_sets/posebusters/proteins'
ligand_path = f'{project_path}/eval_sets/posebusters/ligands'
lmdb_path = f'{project_path}/posebuster'
meta_info_file = f'{project_path}/eval_sets/posebusters/posebuster_set_meta.csv'
lmdb_name = 'posebuster_428'
num_ligand = 2 # choose the first two complexes to save time

### for astex
# protein_path = f'{project_path}/eval_sets/astex/proteins'
# ligand_path = f'{project_path}/eval_sets/astex/ligands'
# lmdb_path = f'{project_path}/astex'
# meta_info_file = f'{project_path}/eval_sets/astex/astex_set_meta.csv'
# lmdb_name = 'astex_85'
# num_ligand = 85

### generate lmdb
write_lmdb(protein_path, ligand_path, lmdb_path, meta_info_file, lmdb_name, num_ligand=num_ligand, nthreads=nthreads)
Example of meta_info content: 
  pdb_code lig_code                                  prot_url  \
0     5S8I      2LY  https://files.rcsb.org/download/5S8I.pdb   

                                             lig_url  \
0  http://ligand-expo.rcsb.org/files/2/2LY/isdf/5...   

                                            ligs  
0  <rdkit.Chem.rdchem.Mol object at 0x172fc16c0>  
pdb code: ['5S8I', '5SAK'] 
lig code: ['2LY', 'ZRY']
Start preprocessing data...
Number of systems: 2
2it [00:03,  1.58s/it]
Total num: 2, Success: 2, Failed: 0
Done!

代码
文本

Inference Using Public Model Weights

This script is the same as the one in the Uni-Mol Readme.

The model weights for protein-ligand binding pose prediction can also be obtained from the Uni-Mol repository.

代码
文本
[4]
data_path=lmdb_path
results_path=f'{project_path}/infer_pose' # replace to your results path
weight_path=f'{project_path}/ckp/binding_pose_220908.pt'
batch_size=8
dist_threshold=8.0
recycling=3
valid_subset=lmdb_name
mol_dict_name='dict_mol.txt'
pocket_dict_name='dict_pkt.txt'

!cp $project_path/example_data/molecule/dict.txt $data_path/$mol_dict_name
!cp $project_path/example_data/pocket/dict_coarse.txt $data_path/$pocket_dict_name
!python $project_path/unimol/infer.py --user-dir $project_path/unimol $data_path --valid-subset $valid_subset \
--results-path $results_path \
--num-workers 8 --ddp-backend=c10d --batch-size $batch_size \
--task docking_pose --loss docking_pose --arch docking_pose \
--path $weight_path \
--fp16 --fp16-init-scale 4 --fp16-scale-window 256 \
--dist-threshold $dist_threshold --recycling $recycling \
--log-interval 50 --log-format simple
2024-07-23 18:59:12 | INFO | unimol.inference | loading model(s) from /workspace/Uni-Mol/ckp/binding_pose_220908.pt
2024-07-23 18:59:12 | INFO | unimol.tasks.docking_pose | ligand dictionary: 30 types
2024-07-23 18:59:12 | INFO | unimol.tasks.docking_pose | pocket dictionary: 9 types
2024-07-23 18:59:16 | INFO | unimol.inference | Namespace(activation_dropout=0.0, activation_fn='gelu', adam_betas='(0.9, 0.999)', adam_eps=1e-08, all_gather_list_size=16384, allreduce_fp32_grad=False, arch='docking_pose', attention_dropout=0.1, batch_size=8, batch_size_valid=8, bf16=False, bf16_sr=False, broadcast_buffers=False, bucket_cap_mb=25, conf_size=10, cpu=False, curriculum=0, data='/workspace/Uni-Mol/posebuster', data_buffer_size=10, ddp_backend='c10d', delta_pair_repr_norm_loss=-1.0, device_id=0, disable_validation=False, dist_threshold=8.0, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_num_procs=1, distributed_port=-1, distributed_rank=0, distributed_world_size=1, dropout=0.1, ema_decay=-1.0, emb_dropout=0.1, empty_cache_freq=0, encoder_attention_heads=64, encoder_embed_dim=512, encoder_ffn_embed_dim=2048, encoder_layers=15, fast_stat_sync=False, find_unused_parameters=False, finetune_mol_model=None, finetune_pocket_model=None, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=True, fp16_init_scale=4, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=256, log_format='simple', log_interval=50, loss='docking_pose', lr_scheduler='fixed', lr_shrink=0.1, masked_coord_loss=-1.0, masked_dist_loss=-1.0, masked_token_loss=-1.0, max_pocket_atoms=256, max_seq_len=512, max_valid_steps=None, min_loss_scale=0.0001, model_overrides='{}', mol=Namespace(activation_dropout=0.0, activation_fn='gelu', attention_dropout=0.1, delta_pair_repr_norm_loss=-1.0, dropout=0.1, emb_dropout=0.1, encoder_attention_heads=64, encoder_embed_dim=512, encoder_ffn_embed_dim=2048, encoder_layers=15, masked_coord_loss=-1.0, masked_dist_loss=-1.0, masked_token_loss=-1.0, max_seq_len=512, pooler_activation_fn='tanh', pooler_dropout=0.0, post_ln=False, x_norm_loss=-1.0), no_progress_bar=False, no_seed_provided=False, nprocs_per_node=1, num_workers=8, optimizer='adam', path='/workspace/Uni-Mol/ckp/binding_pose_220908.pt', pocket=Namespace(activation_dropout=0.0, activation_fn='gelu', attention_dropout=0.1, delta_pair_repr_norm_loss=-1.0, dropout=0.1, emb_dropout=0.1, encoder_attention_heads=64, encoder_embed_dim=512, encoder_ffn_embed_dim=2048, encoder_layers=15, masked_coord_loss=-1.0, masked_dist_loss=-1.0, masked_token_loss=-1.0, max_seq_len=512, pooler_activation_fn='tanh', pooler_dropout=0.0, post_ln=False, x_norm_loss=-1.0), pooler_activation_fn='tanh', pooler_dropout=0.0, post_ln=False, profile=False, quiet=False, recycling=3, required_batch_size_multiple=1, results_path='/workspace/Uni-Mol/infer_pose', seed=1, skip_invalid_size_inputs_valid_test=False, suppress_crashes=False, task='docking_pose', tensorboard_logdir='', threshold_loss_scale=None, train_subset='train', user_dir='/workspace/Uni-Mol/unimol', valid_subset='posebuster_428', validate_after_updates=0, validate_interval=1, validate_interval_updates=0, validate_with_ema=False, wandb_name='', wandb_project='', warmup_updates=0, weight_decay=0.0, x_norm_loss=-1.0)
2024-07-23 18:59:16 | INFO | unicore.tasks.unicore_task | get EpochBatchIterator for epoch 1
2024-07-23 18:59:20 | INFO | unimol.inference | Done inference! 
代码
文本

Perform Docking Based on the Predicted Distance Matrix, Then Calculate the RMSD Metric:

The script is the same as the one in the Uni-Mol Readme.

The RMSD calculated here is the pure RMSD, without considering symmetry.

代码
文本
[5]
nthreads=nthreads
predict_file=f"{results_path}/ckp_{lmdb_name}.out.pkl" # Your inference file dir
reference_file=f"{lmdb_path}/{lmdb_name}.lmdb" # Your reference file dir
output_path=f"{project_path}/{lmdb_name}_predict_sdf" # Docking results path

%cd $project_path
!python $project_path/unimol/utils/docking.py --nthreads $nthreads --predict-file $predict_file --reference-file $reference_file --output-path $output_path
/workspace/Uni-Mol
100%|█████████████████████████████████████████████| 2/2 [00:00<00:00, 12.41it/s]
  0%|                                                     | 0/2 [00:00<?, ?it/s]5SAK-N=C1N/C(=N\Nc2ccccc2)c2ccccc21-RMSD:0.6344-0.0361-0.2067
5S8I-CNC(=O)c1scc2c1OCCO2-RMSD:1.0975-0.0144-0.2149
100%|█████████████████████████████████████████████| 2/2 [00:23<00:00, 11.97s/it]
RMSD < 1.0 :  0.5
RMSD < 1.5 :  1.0
RMSD < 2.0 :  1.0
RMSD < 3.0 :  1.0
RMSD < 5.0 :  1.0
avg RMSD :  0.8659875802603305
代码
文本

Calculate Symmetric RMSD Metric

代码
文本
[6]
from rdkit.Chem.rdMolAlign import CalcRMS

def get_mol(sdf_path):
supp = Chem.SDMolSupplier(sdf_path)
mols = [mol for mol in supp if mol]
if len(mols) == 0:
print(lsdf)
mol = mols[0]
return mol

def get_sym_rmsd(predicted_sdf_path, reference_sdf_path, meta_info_file):
df = pd.read_csv(meta_info_file)
pdb_ids = list(df['pdb_code'].values)[:2]
lig_ids = list(df['lig_code'].values)[:2]
print(f'calc rmsd for: \npdb code: {pdb_ids} \nlig code: {lig_ids}')
sym_rmsd_results = []
for pdbid, ligid in zip(pdb_ids, lig_ids):
ref_sdf = os.path.join(reference_sdf_path, f'{pdbid}_{ligid}.sdf')
prb_sdf = os.path.join(predicted_sdf_path, f'{pdbid}.ligand.sdf')
ref_mol = get_mol(ref_sdf)
prb_mol = get_mol(prb_sdf)
sym_rmsd = CalcRMS(
Chem.RemoveHs(prb_mol),
Chem.RemoveHs(ref_mol)
)
sym_rmsd_results.append(sym_rmsd)
sym_rmsd_results = np.array(sym_rmsd_results)
return sym_rmsd_results

def print_results(rmsd_results):
print('*'*50)
print(f'results length: {len(rmsd_results)}')
print('RMSD < 1.0 : ', np.mean(rmsd_results<1.0))
print('RMSD < 1.5 : ', np.mean(rmsd_results<1.5))
print('RMSD < 2.0 : ', np.mean(rmsd_results<2.0))
print('RMSD < 3.0 : ', np.mean(rmsd_results<3.0))
print('RMSD < 5.0 : ', np.mean(rmsd_results<5.0))
print('avg RMSD : ', np.mean(rmsd_results))
代码
文本
[7]
predicted_sdf_path = f'{output_path}/cache'
reference_sdf_path = ligand_path

### cal sym rmsd metrics
rmsd_results = get_sym_rmsd(predicted_sdf_path, reference_sdf_path, meta_info_file)
print_results(rmsd_results)
calc rmsd for: 
pdb code: ['5S8I', '5SAK'] 
lig code: ['2LY', 'ZRY']
**************************************************
results length: 2
RMSD < 1.0 :  0.5
RMSD < 1.5 :  1.0
RMSD < 2.0 :  1.0
RMSD < 3.0 :  1.0
RMSD < 5.0 :  1.0
avg RMSD :  0.8659898326762625
代码
文本

Prediction Structure Visualization

代码
文本
[8]
!pip install py3Dmol
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: py3Dmol in /opt/conda/lib/python3.8/site-packages (2.2.1)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
代码
文本
双击即可修改
代码
文本
[9]
import py3Dmol
pdb_id = '5SAK'
lig_id = 'ZRY'
pdb_path = os.path.join(protein_path, f'{pdb_id}.pdb')
ligand_path = os.path.join(predicted_sdf_path, f'{pdb_id}.ligand.sdf')
gt_ligand_path = os.path.join(reference_sdf_path, f'{pdb_id}_{lig_id}.sdf')

view = py3Dmol.view()
view.removeAllModels()

view.addModel(open(pdb_path,'r').read(),format='pdb')
view.setStyle({'cartoon': {'arrows':True, 'tubes':False, 'style':'oval', 'color':'white'}})
view.addSurface(py3Dmol.VDW,{'opacity':0.5,'color':'white'})

view.addModel(open(ligand_path,'r').read(),format='sdf')
ref_m = view.getModel()
ref_m.setStyle({},{'stick':{'colorscheme':'greenCarbon','radius':0.2}})

view.zoomTo(viewer=(100,0))
view.show()

view.removeAllModels()


view.addModel(open(ligand_path,'r').read(),format='sdf')
ref_m = view.getModel()
ref_m.setStyle({},{'stick':{'colorscheme':'greenCarbon','radius':0.2}})


view.addModel(open(gt_ligand_path,'r').read(),format='sdf')
ref_m = view.getModel()
ref_m.setStyle({},{'stick':{'colorscheme':'redCarbon','radius':0.2}})


view.zoomTo(viewer=(100,0))
view.show()
代码
文本

In the image

The green molecule is the structure predicted by unimol

The red molecule is the crystal structure

代码
文本
Uni-Mol
docking
Uni-Moldocking
点个赞吧
推荐阅读
公开
Demo for Uni-Mol Docking on PoseBusters
Uni-Moldocking
Uni-Moldocking
zhougm@dp.tech
发布于 2023-11-16
4 赞1 转存文件2 评论
公开
Uni-ELF Ideas & App Heads-on Explained
Uni-ELFUni-MolPiloteyeEnglishBohrium Apps
Uni-ELFUni-MolPiloteyeEnglishBohrium Apps
chenx@dp.tech
更新于 2024-09-09
{/**/}