Drug-target complex binding affinity prediction with unimol

空间站广场

论文

Notebooks

比赛

课程

Apps

我的主页

我的Notebooks

我的论文库

我的足迹

我的工作空间

任务

节点

文件

数据集

镜像

项目

数据库

公开

Drug-target complex binding affinity prediction with unimol

unimol

drug

unimoldrug

nickkk

发布于 2023-11-09

推荐镜像 :phiformer:2

推荐机型 :c8_m32_1 * NVIDIA V100

Drug-target complex binding affinity prediction with unimol

作者：Zhe WANG

时间：2023年11月9日

共享协议：本作品采用知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议进行许可。。

代码

文本

我们在 batched target fishing with TransformerCPI 中使用了transformerCPI来预测亲和力，然而TCPI是通过序列和smiles来预测亲和力的，那么如果我们给定了结合后的复合物结构，我们能否来预测亲和力呢。这篇notebook教你如何用unimol来进行预测

代码

文本

解压样例文件，也可以上传自己的文件，只要包含PID_protein.pdb, PID__ligand.mol2即可

代码

文本

[1]

! tar -xzvf /workspace/input.tar.gz -C /workspace

input/
input/3cz1/
input/3cz1/3cz1_protein.pdb
input/3cz1/3cz1_ligand.mol2
input/3cz1/3cz1_ligand.sdf
input/3cz1/3cz1_pocket.pdb
input/3e93/
input/3e93/feature_attn_t6_ssgnn.pkl
input/3e93/feature_attn_t6_padded.pkl
input/3e93/3e93_ligand.sdf
input/3e93/3e93_protein.mol2
input/3e93/3e93_protein.pdb
input/3e93/feature_attn_t6_init.pkl
input/3e93/3e93_ligand_opt.mol2
input/3e93/3e93_ligand.mol2
input/3e93/3e93_pocket.pdb
input/3e93/feature_attn_t6.pkl

代码

文本

使用提供的脚本把数据处理成lmdb格式

代码

文本

[2]

! python /workspace/preprocess.py --input_path /workspace/input

Namespace(input_path='/workspace/input')
/workspace/input
0it [00:00, ?it/s]3e93
1it [00:01,  1.36s/it]3cz1
2it [00:01,  1.02it/s]

代码

文本

加载模型的一些参数

代码

文本

[3]

import logging

import os

import sys

import pickle

import torch

from unicore import checkpoint_utils, distributed_utils, options, utils

from unicore.logging import progress_bar

from unicore import tasks

with open('/workspace/args.pkl', 'rb') as f:

args = pickle.load(f)

/opt/conda/lib/python3.8/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

代码

文本

GPU以及一些必要设置

代码

文本

[4]

assert (

args.batch_size is not None

), "Must specify batch size either with --batch-size"

use_fp16 = args.fp16

use_cuda = torch.cuda.is_available() and not args.cpu

if use_cuda:

torch.cuda.set_device(args.device_id)

data_parallel_world_size = 1

data_parallel_rank = 0

代码

文本

注册任务以及模型，加载训练好的checkpoint

代码

文本

[6]

import sys

sys.path.append('/workspace/pgmn/unimol')

sys.path.insert(0, '/workspace/pgmn')

import unimol.tasks

state = checkpoint_utils.load_checkpoint_to_cpu(args.path)

task = tasks.setup_task(args)

model = task.build_model(args)

model.load_state_dict(state["model"], strict=False)

if use_fp16:

model.half()

if use_cuda:

model.cuda()

代码

文本

运行模型推理

代码

文本

[8]

for subset in args.valid_subset.split(","):

try:

task.load_dataset(subset, combine=False, epoch=1)

dataset = task.dataset(subset)

except KeyError:

raise Exception("Cannot find dataset: " + subset)

if not os.path.exists(args.results_path):

os.makedirs(args.results_path)

fname = (args.path).split("/")[-2]

save_path = os.path.join(args.results_path, fname + "_" + subset + ".out.pkl")

# Initialize data iterator

itr = task.get_batch_iterator(

dataset=dataset,

batch_size=args.batch_size,

ignore_invalid_inputs=True,

required_batch_size_multiple=args.required_batch_size_multiple,

seed=args.seed,

num_shards=data_parallel_world_size,

shard_id=data_parallel_rank,

num_workers=args.num_workers,

data_buffer_size=args.data_buffer_size,

).next_epoch_itr(shuffle=False)

progress = progress_bar.progress_bar(

itr,

log_format=args.log_format,

log_interval=args.log_interval,

prefix=f"valid on '{subset}' subset",

default_log_format=("tqdm" if not args.no_progress_bar else "simple"),

)

log_outputs = []

import pandas as pd

df = pd.DataFrame(columns=['complex', 'affinity'])

for i, sample in enumerate(progress):

sample = utils.move_to_cuda(sample) if use_cuda else sample

net_output = model(**sample["net_input"]).detach().cpu().numpy()

df['complex'] = sample['pocket_name']

df['affinity'] = list(net_output)

df.to_csv(os.path.join(args.results_path, 'res.csv'), index = False)

# logger.info("Done inference! ")

代码

文本

观测结果

代码

文本

[9]

	complex	affinity
0	3e93	4.613281
1	3cz1	4.531250

代码

文本

双击即可修改

代码

文本

unimol

drug

unimoldrug

点个赞吧

本文被以下合集收录

此山君

更新于 2024-01-16

1 篇0 人关注