[Uni-Dock Demo] Uni-Dock高性能分子对接引擎 - 使用案例
2023年6月13日,深势科技在Journal of Chemical Theory and Computation上发表封面文章Uni-Dock: GPU-Accelerated Docking Enables Ultralarge Virtual Screening,发布了基于GPU加速的高性能分子对接引擎Uni-Dock,在保持原始计算精度的前提下,在NVIDIA V100 GPU上实现了分子对接计算速度对比单核CPU超过1600倍的加速比。研发团队使用Uni-Dock,在100张NVIDIA V100显卡的计算集群上,仅花费11.3小时即完成在KRAS G12D靶点上对Enamine Diverse Real类药数据库3820万的多级虚拟筛选,平均速度超过3.7万次分子对接/卡时。这项工作显著降低了超大规模分子库的虚拟筛选所需要的时间和经济成本,为新药研发早期阶段中高效探索更大化学空间提供了可靠能力。
June 15, 2023 – In a recent cover article, "Uni-Dock: GPU-Accelerated Docking Enables Ultralarge Virtual Screening", published in the Journal of Chemical Theory and Computation, DP Technology has introduced Uni-Dock, a GPU-accelerated high-performance molecular docking engine。 This technology allows an acceleration of molecular docking calculations up to 1,600 times faster than a single-core CPU on an NVIDIA V100 GPU, while preserving computational accuracy. Leveraging Uni-Dock, the research team successfully completed a multistage virtual screening of 38.2 million compounds from the Enamine Diverse REAL drug database on the KRAS G12D target within just 11.3 hours, using a cluster of 100 NVIDIA V100 GPUs. The screening's average speed exceeded 37,000 molecular docking computations per GPU per hour, which substantially reduces the time and cost needed for ultra-large scale virtual screenings, thereby enabling efficient exploration of extensive chemical spaces during the early stages of new drug development.
**Uni-Dock高性能分子对接引擎现面向用户开放免费获取!**遵从使用协议,用户可以从深势科技GitHub仓库的Uni-Dock release页面获取Uni-Dock的最新发行版。
Uni-Dock High-Performance Molecular Docking Engine is now available for users to obtain for free! In compliance with the usage agreement, users can obtain the latest release of Uni-Dock from DeepTech's GitHub repository on the Uni-Dock release page.
通过本教程,你可以学会如何下载、安装Uni-Dock,使用Uni-Dock运行一个分子对接任务,并对其结果进行简单分析。
Through this tutorial, you can learn how to download, install Uni-Dock, run a molecular docking task using Uni-Dock, and perform a simple analysis of the results.
快速开始:点击上方的 开始连接 按钮,选择 bohrium-notebook:05-31镜像及任意GPU节点(建议使用c12_m92_1 * NVIDIA V100)配置,稍等片刻即可运行。
Quick Start: Click the Start Connection button at the top, choose the bohrium-notebook:05-31 image and any GPU node (we recommend using c12_m92_1 * NVIDIA V100) configuration, and wait a moment to run.
1. [Installation] 下载和安装Uni-Dock
--2023-07-03 13:36:03-- https://github.com/dptech-corp/Uni-Dock/releases/download/1.0.0/unidock Resolving ga.dp.tech (ga.dp.tech)... 10.255.254.37, 10.255.254.18, 10.255.254.7 Connecting to ga.dp.tech (ga.dp.tech)|10.255.254.37|:8118... connected. Proxy request sent, awaiting response... 302 Found Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/645746447/8e0bdc58-8f55-4d38-923d-3b3fe31dfcd3?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230703%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230703T053603Z&X-Amz-Expires=300&X-Amz-Signature=f38623dbd92f92a9b55debe44e4883bcf95edb7ddd393fb850ad1f91bde445dd&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=645746447&response-content-disposition=attachment%3B%20filename%3Dunidock&response-content-type=application%2Foctet-stream [following] --2023-07-03 13:36:03-- https://objects.githubusercontent.com/github-production-release-asset-2e65be/645746447/8e0bdc58-8f55-4d38-923d-3b3fe31dfcd3?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230703%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230703T053603Z&X-Amz-Expires=300&X-Amz-Signature=f38623dbd92f92a9b55debe44e4883bcf95edb7ddd393fb850ad1f91bde445dd&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=645746447&response-content-disposition=attachment%3B%20filename%3Dunidock&response-content-type=application%2Foctet-stream Connecting to ga.dp.tech (ga.dp.tech)|10.255.254.37|:8118... connected. Proxy request sent, awaiting response... 200 OK Length: 8542312 (8.1M) [application/octet-stream] Saving to: ‘unidock’ unidock 100%[===================>] 8.15M 4.40MB/s in 1.9s 2023-07-03 13:36:06 (4.40 MB/s) - ‘unidock’ saved [8542312/8542312]
将Uni-Dock添加至环境变量后,就可以使用Uni-Dock高性能分子对接软件了!
After adding Uni-Dock to the environment variables, you can use the Uni-Dock high-performance molecular docking software!
Uni-Dock v0.1.0 Input: --receptor arg rigid part of the receptor (PDBQT) --flex arg flexible side chains, if any (PDBQT) --ligand arg ligand (PDBQT) --ligand_index arg file containing paths to ligands --batch arg batch ligand (PDBQT) --gpu_batch arg gpu batch ligand (PDBQT) --scoring arg (=vina) scoring function (ad4, vina or vinardo) Search space (required): --maps arg affinity maps for the autodock4.2 (ad4) or vina scoring function --center_x arg X coordinate of the center (Angstrom) --center_y arg Y coordinate of the center (Angstrom) --center_z arg Z coordinate of the center (Angstrom) --size_x arg size in the X dimension (Angstrom) --size_y arg size in the Y dimension (Angstrom) --size_z arg size in the Z dimension (Angstrom) --autobox set maps dimensions based on input ligand(s) (for --score_only and --local_only) Output (optional): --out arg output models (PDBQT), the default is chosen based on the ligand file name --dir arg output directory for batch mode --write_maps arg output filename (directory + prefix name) for maps. Option --force_even_voxels may be needed to comply with .map format Misc (optional): --cpu arg (=0) the number of CPUs to use (the default is to try to detect the number of CPUs or, failing that, use 1) --seed arg (=0) explicit random seed --exhaustiveness arg (=8) exhaustiveness of the global search (roughly proportional to time): 1+ --max_evals arg (=0) number of evaluations in each MC run (if zero, which is the default, the number of MC steps is based on heuristics) --num_modes arg (=9) maximum number of binding modes to generate --min_rmsd arg (=1) minimum RMSD between output poses --energy_range arg (=3) maximum energy difference between the best binding mode and the worst one displayed (kcal/mol) --spacing arg (=0.375) grid spacing (Angstrom) --verbosity arg (=1) verbosity (0=no output, 1=normal, 2=verbose) --max_step arg (=0) maximum number of steps in each MC run (if zero, which is the default, the number of MC steps is based on heuristics) --refine_step arg (=5) number of steps in refinement, default=5 --max_gpu_memory arg (=0) maximum gpu memory to use (default=0, use all available GPU memory to optain maximum batch size) --search_mode arg search mode of vina (fast, balance, detail), using recommended settings of exhaustiveness and search steps; the higher the computational complexity, the higher the accuracy, but the larger the computational cost Configuration file (optional): --config arg the above options can be put here Information (optional): --help display usage summary --help_advanced display usage summary with advanced options --version display program version
2. [Run Docking using Uni-Dock] 使用Uni-Dock进行分子对接
2.1 [Download Datasets] 下载测试数据集
Cloning into 'Uni-Dock'... remote: Enumerating objects: 71, done. remote: Counting objects: 100% (71/71), done. remote: Compressing objects: 100% (45/45), done. remote: Total 71 (delta 19), reused 69 (delta 19), pack-reused 0 Unpacking objects: 100% (71/71), 24.51 MiB | 2.56 MiB/s, done.
可以看到,测试数据集中共有102个活性分子和5696个非活性分子。
As can be seen, there are a total of 102 active molecules and 5,696 inactive molecules in the test dataset.
2.2 [Prepare Command] 准备Uni-Dock命令行脚本
Uni-Dock定义了三种计算复杂度级别,从低到高分别命名为Fast Mode,Balanced Mode和Detailed Mode。
- Uni-Dock Fast模式速度较快,精度稍低,对接速度约0.10s/ligand;
- Uni-Dock Balanced模式兼顾速度和精度,对接速度约0.32s/ligand;
- Uni-Dock Detailed模式速度稍低,精度较高,对接速度约0.42s/ligand。
Uni-Dock defines three levels of computational complexity, named from low to high as Fast Mode, Balanced Mode, and Detailed Mode.
- Uni-Dock Fast mode is faster with slightly lower accuracy, and the docking speed is about 0.10s/ligand;
- Uni-Dock Balanced mode balances speed and accuracy, with a docking speed of about 0.42s/ligand;
- Uni-Dock Detailed mode has slightly lower speed but higher accuracy, and the docking speed is about 0.38s/ligand.
#### search mode: [fast] #### ./unidock --receptor Uni-Dock/example/screening_test/indata/def.pdbqt --ligand_index def_ligands.index --center_x -36.01 --center_y 25.63 --center_z 67.49 --size_x 17.20 --size_y 14.38 --size_z 12.24 --scoring vinardo --refine_step 3 --num_modes 1 --seed 5 --search_mode fast --dir results/def-fast #### search mode: [balanced] #### ./unidock --receptor Uni-Dock/example/screening_test/indata/def.pdbqt --ligand_index def_ligands.index --center_x -36.01 --center_y 25.63 --center_z 67.49 --size_x 17.20 --size_y 14.38 --size_z 12.24 --scoring vinardo --refine_step 3 --num_modes 1 --seed 5 --search_mode balanced --dir results/def-balanced #### search mode: [detail] #### ./unidock --receptor Uni-Dock/example/screening_test/indata/def.pdbqt --ligand_index def_ligands.index --center_x -36.01 --center_y 25.63 --center_z 67.49 --size_x 17.20 --size_y 14.38 --size_z 12.24 --scoring vinardo --refine_step 3 --num_modes 1 --seed 5 --search_mode detail --dir results/def-detail
2.3 [Run docking] 使用Uni-Dock进行分子对接
- Fast Mode运行时间大约为10分钟;
- Balanced Mode运行时间大约为30分钟;
- Detail Mode运行时间大约为40分钟。
- Fast Mode runtime is approximately 10 minutes;
- Balanced Mode runtime is approximately 30 minutes;
- Detail Mode runtime is approximately 40 minutes.
#### Uni-Dock fast mode #### Number of Ligands: 5798 Total Time: 448.4152 s Average Time: 0.0773 s/ligand (12.9 ligands/s) #### Uni-Dock balanced mode #### Number of Ligands: 5798 Total Time: 1743.1072 s Average Time: 0.3006 s/ligand (3.3 ligands/s) #### Uni-Dock detail mode #### Number of Ligands: 5798 Total Time: 2465.3312 s Average Time: 0.4252 s/ligand (2.4 ligands/s)
2.4 [Analysis] 结果分析
我们使用**富集因子(Enrichment Factor, EF)**来表征分子对接在筛选活性化合物上的性能。Enrichment Factor用于评估我们的筛选方法相对于随机筛选的优越性。它表示在筛选的前N个结果中,活性分子的比例与整个数据集中活性分子的比例之间的比值。Enrichment Factor的计算公式如下:
We use the Enrichment Factor (EF) to characterize the performance of molecular docking in screening active compounds. The Enrichment Factor is used to evaluate the superiority of our screening method compared to random screening. It represents the ratio between the proportion of active molecules in the top N results of the screening and the proportion of active molecules in the entire dataset.
The calculation formula for Enrichment Factor is as follows:
Search Mode EF1% EF3% EF5% EF10% EF20% fast 19.61 8.82 6.27 3.92 2.50 balanced 16.67 11.11 7.84 5.00 3.38 detail 14.71 10.13 7.45 4.71 3.19
Hui_Zhou