Bohrium
robot
新建

空间站广场

论文
Notebooks
比赛
课程
Apps
我的主页
我的Notebooks
我的论文库
我的足迹

我的工作空间

任务
节点
文件
数据集
镜像
项目
数据库
公开
Uni-Dock高性能分子对接引擎 - 使用案例
Uni-Dock
分子对接
化学信息学
中文
Uni-Dock分子对接化学信息学中文
zhengh@dp.tech
发布于 2023-06-18
赞 10
30
AI4SCUP-CNS-BBB(v1)

[Uni-Dock Demo] Uni-Dock高性能分子对接引擎 - 使用案例

Yu, Y., Cai, C., Wang, J., Bo, Z., Zhu, Z., & Zheng, H. (2023). Uni-Dock: GPU-Accelerated Docking Enables Ultralarge Virtual Screening. Journal of Chemical Theory and Computation.

2023年6月13日,深势科技在Journal of Chemical Theory and Computation上发表封面文章Uni-Dock: GPU-Accelerated Docking Enables Ultralarge Virtual Screening,发布了基于GPU加速的高性能分子对接引擎Uni-Dock,在保持原始计算精度的前提下,在NVIDIA V100 GPU上实现了分子对接计算速度对比单核CPU超过1600倍的加速比。研发团队使用Uni-Dock,在100张NVIDIA V100显卡的计算集群上,仅花费11.3小时即完成在KRAS G12D靶点上对Enamine Diverse Real类药数据库3820万的多级虚拟筛选,平均速度超过3.7万次分子对接/卡时。这项工作显著降低了超大规模分子库的虚拟筛选所需要的时间和经济成本,为新药研发早期阶段中高效探索更大化学空间提供了可靠能力。

June 15, 2023 – In a recent cover article, "Uni-Dock: GPU-Accelerated Docking Enables Ultralarge Virtual Screening", published in the Journal of Chemical Theory and Computation, DP Technology has introduced Uni-Dock, a GPU-accelerated high-performance molecular docking engine。 This technology allows an acceleration of molecular docking calculations up to 1,600 times faster than a single-core CPU on an NVIDIA V100 GPU, while preserving computational accuracy. Leveraging Uni-Dock, the research team successfully completed a multistage virtual screening of 38.2 million compounds from the Enamine Diverse REAL drug database on the KRAS G12D target within just 11.3 hours, using a cluster of 100 NVIDIA V100 GPUs. The screening's average speed exceeded 37,000 molecular docking computations per GPU per hour, which substantially reduces the time and cost needed for ultra-large scale virtual screenings, thereby enabling efficient exploration of extensive chemical spaces during the early stages of new drug development.

代码
文本

**Uni-Dock高性能分子对接引擎现面向用户开放免费获取!**遵从使用协议,用户可以从深势科技GitHub仓库Uni-Dock release页面获取Uni-Dock的最新发行版。

Uni-Dock High-Performance Molecular Docking Engine is now available for users to obtain for free! In compliance with the usage agreement, users can obtain the latest release of Uni-Dock from DeepTech's GitHub repository on the Uni-Dock release page.

通过本教程,你可以学会如何下载、安装Uni-Dock,使用Uni-Dock运行一个分子对接任务,并对其结果进行简单分析。
Through this tutorial, you can learn how to download, install Uni-Dock, run a molecular docking task using Uni-Dock, and perform a simple analysis of the results.
快速开始:点击上方的 开始连接 按钮,选择 bohrium-notebook:05-31镜像及任意GPU节点(建议使用c12_m92_1 * NVIDIA V100)配置,稍等片刻即可运行。
Quick Start: Click the
Start Connection button at the top, choose the bohrium-notebook:05-31 image and any GPU node (we recommend using c12_m92_1 * NVIDIA V100) configuration, and wait a moment to run.

代码
文本

1. [Installation] 下载和安装Uni-Dock

代码
文本
[1]
!wget https://github.com/dptech-corp/Uni-Dock/releases/download/1.0.0/unidock
--2023-07-03 13:36:03--  https://github.com/dptech-corp/Uni-Dock/releases/download/1.0.0/unidock
Resolving ga.dp.tech (ga.dp.tech)... 10.255.254.37, 10.255.254.18, 10.255.254.7
Connecting to ga.dp.tech (ga.dp.tech)|10.255.254.37|:8118... connected.
Proxy request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/645746447/8e0bdc58-8f55-4d38-923d-3b3fe31dfcd3?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230703%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230703T053603Z&X-Amz-Expires=300&X-Amz-Signature=f38623dbd92f92a9b55debe44e4883bcf95edb7ddd393fb850ad1f91bde445dd&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=645746447&response-content-disposition=attachment%3B%20filename%3Dunidock&response-content-type=application%2Foctet-stream [following]
--2023-07-03 13:36:03--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/645746447/8e0bdc58-8f55-4d38-923d-3b3fe31dfcd3?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230703%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230703T053603Z&X-Amz-Expires=300&X-Amz-Signature=f38623dbd92f92a9b55debe44e4883bcf95edb7ddd393fb850ad1f91bde445dd&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=645746447&response-content-disposition=attachment%3B%20filename%3Dunidock&response-content-type=application%2Foctet-stream
Connecting to ga.dp.tech (ga.dp.tech)|10.255.254.37|:8118... connected.
Proxy request sent, awaiting response... 200 OK
Length: 8542312 (8.1M) [application/octet-stream]
Saving to: ‘unidock’

unidock             100%[===================>]   8.15M  4.40MB/s    in 1.9s    

2023-07-03 13:36:06 (4.40 MB/s) - ‘unidock’ saved [8542312/8542312]

代码
文本

将Uni-Dock添加至环境变量后,就可以使用Uni-Dock高性能分子对接软件了!

After adding Uni-Dock to the environment variables, you can use the Uni-Dock high-performance molecular docking software!

代码
文本
[2]
!chmod +765 unidock
!./unidock --help
Uni-Dock v0.1.0

Input:
  --receptor arg             rigid part of the receptor (PDBQT)
  --flex arg                 flexible side chains, if any (PDBQT)
  --ligand arg               ligand (PDBQT)
  --ligand_index arg         file containing paths to ligands
  --batch arg                batch ligand (PDBQT)
  --gpu_batch arg            gpu batch ligand (PDBQT)
  --scoring arg (=vina)      scoring function (ad4, vina or vinardo)

Search space (required):
  --maps arg                 affinity maps for the autodock4.2 (ad4) or vina 
                             scoring function
  --center_x arg             X coordinate of the center (Angstrom)
  --center_y arg             Y coordinate of the center (Angstrom)
  --center_z arg             Z coordinate of the center (Angstrom)
  --size_x arg               size in the X dimension (Angstrom)
  --size_y arg               size in the Y dimension (Angstrom)
  --size_z arg               size in the Z dimension (Angstrom)
  --autobox                  set maps dimensions based on input ligand(s) (for 
                             --score_only and --local_only)

Output (optional):
  --out arg                  output models (PDBQT), the default is chosen based
                             on the ligand file name
  --dir arg                  output directory for batch mode
  --write_maps arg           output filename (directory + prefix name) for 
                             maps. Option --force_even_voxels may be needed to 
                             comply with .map format

Misc (optional):
  --cpu arg (=0)             the number of CPUs to use (the default is to try 
                             to detect the number of CPUs or, failing that, use
                             1)
  --seed arg (=0)            explicit random seed
  --exhaustiveness arg (=8)  exhaustiveness of the global search (roughly 
                             proportional to time): 1+
  --max_evals arg (=0)       number of evaluations in each MC run (if zero, 
                             which is the default, the number of MC steps is 
                             based on heuristics)
  --num_modes arg (=9)       maximum number of binding modes to generate
  --min_rmsd arg (=1)        minimum RMSD between output poses
  --energy_range arg (=3)    maximum energy difference between the best binding
                             mode and the worst one displayed (kcal/mol)
  --spacing arg (=0.375)     grid spacing (Angstrom)
  --verbosity arg (=1)       verbosity (0=no output, 1=normal, 2=verbose)
  --max_step arg (=0)        maximum number of steps in each MC run (if zero, 
                             which is the default, the number of MC steps is 
                             based on heuristics)
  --refine_step arg (=5)     number of steps in refinement, default=5
  --max_gpu_memory arg (=0)  maximum gpu memory to use (default=0, use all 
                             available GPU memory to optain maximum batch size)
  --search_mode arg          search mode of vina (fast, balance, detail), using
                             recommended settings of exhaustiveness and search 
                             steps; the higher the computational complexity, 
                             the higher the accuracy, but the larger the 
                             computational cost

Configuration file (optional):
  --config arg               the above options can be put here

Information (optional):
  --help                     display usage summary
  --help_advanced            display usage summary with advanced options
  --version                  display program version

代码
文本

2. [Run Docking using Uni-Dock] 使用Uni-Dock进行分子对接

代码
文本

2.1 [Download Datasets] 下载测试数据集

代码
文本
[3]
!git clone https://github.com/dptech-corp/Uni-Dock.git
!tar -xvf Uni-Dock/example/screening_test/indata/def.tar.bz2 --no-same-owner > tarlog.out
Cloning into 'Uni-Dock'...
remote: Enumerating objects: 71, done.
remote: Counting objects: 100% (71/71), done.
remote: Compressing objects: 100% (45/45), done.
remote: Total 71 (delta 19), reused 69 (delta 19), pack-reused 0
Unpacking objects: 100% (71/71), 24.51 MiB | 2.56 MiB/s, done.
代码
文本
[5]
import os

# Traverse the "def_unique_charged" folder and get the list of files
# 遍历"def_unique_charged"文件夹,并获取其中的文件列表
for _,_,files in os.walk("def_unique_charged"):
break

nligands = len(files)

# Show the proportion of molecule types in the dataset
# 展示数据集中的分子类型比例
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
plt.figure(figsize=(6,4), dpi=100)
font = {'family': 'serif',
'color': 'black',
'weight': 'normal',
'size': 15}
ax = sns.countplot(x="type", data=pd.DataFrame(
{"type": ["Active" if f.startswith("active") else "Decoy" for f in files]}))
for container in ax.containers:
ax.bar_label(container)
plt.ylabel("Number of Ligands", fontdict=font)
plt.ylim(0,6500)
plt.xlabel("Type of Ligands", fontdict=font)
plt.title("Active and Decoy Ligands", fontdict=font)
plt.show()
代码
文本

可以看到,测试数据集中共有102个活性分子和5696个非活性分子。

As can be seen, there are a total of 102 active molecules and 5,696 inactive molecules in the test dataset.

代码
文本

2.2 [Prepare Command] 准备Uni-Dock命令行脚本

Uni-Dock定义了三种计算复杂度级别,从低到高分别命名为Fast Mode,Balanced Mode和Detailed Mode。

  • Uni-Dock Fast模式速度较快,精度稍低,对接速度约0.10s/ligand;
  • Uni-Dock Balanced模式兼顾速度和精度,对接速度约0.32s/ligand;
  • Uni-Dock Detailed模式速度稍低,精度较高,对接速度约0.42s/ligand。

Uni-Dock defines three levels of computational complexity, named from low to high as Fast Mode, Balanced Mode, and Detailed Mode.

  • Uni-Dock Fast mode is faster with slightly lower accuracy, and the docking speed is about 0.10s/ligand;
  • Uni-Dock Balanced mode balances speed and accuracy, with a docking speed of about 0.42s/ligand;
  • Uni-Dock Detailed mode has slightly lower speed but higher accuracy, and the docking speed is about 0.38s/ligand.
代码
文本
[6]
# 把所有配体的路径列在一个配体索引文件中
# List paths for all ligands in a ligand index file
with open("def_ligands.index", "w") as f:
f.write("\n".join([os.path.join("def_unique_charged", f) for f in files]))
代码
文本
[7]
# Define a configuration dictionary containing parameters required for docking
# 定义配置字典,包含用于对接的所需参数
config = {
"target": "def", # Protein target name 蛋白靶点名称
"receptor": "Uni-Dock/example/screening_test/indata/def.pdbqt", # Protein receptor file path 蛋白受体文件路径
"ligand_index": "def_ligands.index", # Ligand index file path 配体索引文件路径
# "search_mode": "detailed", # Docking search mode [fast, balanced, detailed] 对接搜索模式[fast, balanced, detailed]
"seed": 5, # Random seed 随机种子
"sf": "vinardo", # Scoring function [vina, vinardo] 打分函数[vina, vinardo]
"center_x": -36.01, # Docking box center x coordinate 对接盒中心x坐标
"center_y": 25.63, # Docking box center y coordinate 对接盒中心y坐标
"center_z": 67.49, # Docking box center z coordinate 对接盒中心z坐标
"size_x": 17.20, # Docking box size x 对接盒尺寸x
"size_y": 14.38, # Docking box size y 对接盒尺寸y
"size_z": 12.24 # Docking box size z 对接盒尺寸z
}

for search_mode in ["fast", "balanced", "detail"]:
# Build the command string by appending each parameter
# 通过添加每个参数来构建命令字符串
cmd = "./unidock "
cmd += f"--receptor {config['receptor']} " # Protein receptor file parameter 蛋白受体文件参数
cmd += f"--ligand_index {config['ligand_index']} " # Ligand index parameter 配体索引参数
cmd += f"--center_x {config['center_x']:.2f} " # Docking box center x coordinate parameter 对接盒中心x坐标参数
cmd += f"--center_y {config['center_y']:.2f} " # Docking box center y coordinate parameter 对接盒中心y坐标参数
cmd += f"--center_z {config['center_z']:.2f} " # Docking box center z coordinate parameter 对接盒中心z坐标参数
cmd += f"--size_x {config['size_x']:.2f} " # Docking box size x parameter 对接盒尺寸x参数
cmd += f"--size_y {config['size_y']:.2f} " # Docking box size y parameter 对接盒尺寸y参数
cmd += f"--size_z {config['size_z']:.2f} " # Docking box size z parameter 对接盒尺寸z参数
cmd += f"--scoring {config['sf']} " # Scoring function parameter 打分函数参数
cmd += f"--refine_step 3 " # Optimization step parameter 优化步骤参数
cmd += f"--num_modes 1 " # Number of docking modes to save 保存的对接模式数目
cmd += f"--seed {config.get('seed', 42)} " # Random seed parameter 随机种子参数
cmd += f"--search_mode {search_mode} " # Docking search mode parameter 对接搜索模式参数
cmd += f"--dir results/{config['target']}-{search_mode} " # Output directory parameter 输出目录参数

# Write the command string to a file named "rundock.sh" for later execution
# 将命令字符串写入名为“rundock.sh”的文件,以便稍后执行
with open(f"rundock-{search_mode}.sh", "w") as f:
f.write(cmd)

# Print the command string for debugging purposes
# 为调试目的打印命令字符串
print(f"\n#### search mode: [{search_mode}] ####")
print(cmd)

# Create the output directory if it doesn't exist
# 如果不存在,则创建输出目录
import os
os.makedirs(f"results/{config['target']}-{search_mode}", exist_ok=True)
#### search mode: [fast] ####
./unidock --receptor Uni-Dock/example/screening_test/indata/def.pdbqt --ligand_index def_ligands.index --center_x -36.01 --center_y 25.63 --center_z 67.49 --size_x 17.20 --size_y 14.38 --size_z 12.24 --scoring vinardo --refine_step 3 --num_modes 1 --seed 5 --search_mode fast --dir results/def-fast 

#### search mode: [balanced] ####
./unidock --receptor Uni-Dock/example/screening_test/indata/def.pdbqt --ligand_index def_ligands.index --center_x -36.01 --center_y 25.63 --center_z 67.49 --size_x 17.20 --size_y 14.38 --size_z 12.24 --scoring vinardo --refine_step 3 --num_modes 1 --seed 5 --search_mode balanced --dir results/def-balanced 

#### search mode: [detail] ####
./unidock --receptor Uni-Dock/example/screening_test/indata/def.pdbqt --ligand_index def_ligands.index --center_x -36.01 --center_y 25.63 --center_z 67.49 --size_x 17.20 --size_y 14.38 --size_z 12.24 --scoring vinardo --refine_step 3 --num_modes 1 --seed 5 --search_mode detail --dir results/def-detail 
代码
文本

2.3 [Run docking] 使用Uni-Dock进行分子对接

  • Fast Mode运行时间大约为10分钟;
  • Balanced Mode运行时间大约为30分钟;
  • Detail Mode运行时间大约为40分钟。
  • Fast Mode runtime is approximately 10 minutes;
  • Balanced Mode runtime is approximately 30 minutes;
  • Detail Mode runtime is approximately 40 minutes.
代码
文本
[8]
import time
results = {}
for search_mode in ["fast", "balanced", "detail"]:
start_time = time.time()
os.system(f"bash rundock-{search_mode}.sh > unidocklog.out")
spend_time = time.time() - start_time
results[search_mode] = {}
results[search_mode]["total_time"] = spend_time
results[search_mode]["average_time"] = spend_time/nligands
print(f"#### Uni-Dock {search_mode} mode ####")
print(f"Number of Ligands: {nligands}")
print(f"Total Time: {spend_time:.4f} s")
print(f"Average Time: {spend_time/nligands:.4f} s/ligand ({nligands/spend_time:.1f} ligands/s)")
#### Uni-Dock fast mode ####
Number of Ligands: 5798
Total Time: 448.4152 s
Average Time: 0.0773 s/ligand (12.9 ligands/s)
#### Uni-Dock balanced mode ####
Number of Ligands: 5798
Total Time: 1743.1072 s
Average Time: 0.3006 s/ligand (3.3 ligands/s)
#### Uni-Dock detail mode ####
Number of Ligands: 5798
Total Time: 2465.3312 s
Average Time: 0.4252 s/ligand (2.4 ligands/s)
代码
文本

2.4 [Analysis] 结果分析

我们使用**富集因子(Enrichment Factor, EF)**来表征分子对接在筛选活性化合物上的性能。Enrichment Factor用于评估我们的筛选方法相对于随机筛选的优越性。它表示在筛选的前N个结果中,活性分子的比例与整个数据集中活性分子的比例之间的比值。Enrichment Factor的计算公式如下:

We use the Enrichment Factor (EF) to characterize the performance of molecular docking in screening active compounds. The Enrichment Factor is used to evaluate the superiority of our screening method compared to random screening. It represents the ratio between the proportion of active molecules in the top N results of the screening and the proportion of active molecules in the entire dataset.
The calculation formula for Enrichment Factor is as follows:

代码
文本
[9]
import glob
import os

results = {}

for search_mode in ["fast", "balanced", "detail"]:
results[search_mode] = {"name": [], "pose": [], "score": [], "type": []}
result_files = glob.glob(os.path.join(f"results/{config['target']}-{search_mode}", "*.pdbqt"))
for result_file in result_files:
with open(result_file, "r") as f:
lines = f.readlines()
pose = 1
for line in lines:
if line.startswith("REMARK VINA RESULT:"):
results[search_mode]["name"].append(os.path.basename(result_file).split("_out.pdbqt")[0])
results[search_mode]["pose"].append(pose)
pose += 1
results[search_mode]["score"].append(float(line.split()[3]))
results[search_mode]["type"].append("active" if os.path.basename(result_file).startswith("active") else "decoy")
代码
文本
[40]
def calc_enrichment_factor(data:pd.DataFrame, ratio:float=0.1):
data.sort_values(by="score", inplace=True)
nlig = len(data)
nactive_total = sum(data["type"] == "active")
nactive = sum(data[:int(nlig*ratio)]["type"] == "active")
return (nactive / (nlig*ratio)) / (nactive_total / nlig)

print("Search Mode \tEF1%\tEF3%\tEF5%\tEF10%\tEF20%")
for search_mode in ["fast", "balanced", "detail"]:
df = pd.DataFrame(results[search_mode])
print("{} \t{:.2f}\t{:.2f}\t{:.2f}\t{:.2f}\t{:.2f}".format(
search_mode,
calc_enrichment_factor(df, 0.01),
calc_enrichment_factor(df, 0.03),
calc_enrichment_factor(df, 0.05),
calc_enrichment_factor(df, 0.1),
calc_enrichment_factor(df, 0.2),
))
Search Mode 	EF1%	EF3%	EF5%	EF10%	EF20%
fast    	19.61	8.82	6.27	3.92	2.50
balanced    	16.67	11.11	7.84	5.00	3.38
detail    	14.71	10.13	7.45	4.71	3.19
代码
文本
[ ]

代码
文本
Uni-Dock
分子对接
化学信息学
中文
Uni-Dock分子对接化学信息学中文
已赞10
本文被以下合集收录
测试合集文章列表100篇
xingyanshi@dp.tech很长的名字xingyanshi@dp.tech很长的名字很长的
更新于 2024-08-04
104 篇2 人关注
CADD
9c5545
更新于 2024-04-03
7 篇0 人关注
推荐阅读
公开
使用Uni-Dock v1.1新版本开展分子对接:持续加速、优化体验、拥抱开源
dockingUni-Dockpython
dockingUni-Dockpython
yuanyn@dp.tech
发布于 2024-02-28
1 赞3 转存文件
公开
Uni-Dock论文详解:以C++/CUDA优化的视角
Uni-Dock分子对接CUDA
Uni-Dock分子对接CUDA
yuyj_depart@dp.tech
发布于 2023-07-31
2 赞1 评论
评论
 import os # Travers...

Hui_Zhou

10-20 01:43
NameError: name 'files' is not defined
评论