Bohrium
robot
新建

空间站广场

论文
Notebooks
比赛
课程
Apps
我的主页
我的Notebooks
我的论文库
我的足迹

我的工作空间

任务
节点
文件
数据集
镜像
项目
数据库
公开
FFF: 基于蛋白片段预测的冷冻电镜结构自动搭建
Deep Learning
cryo-EM
AI4S
Deep Learningcryo-EMAI4S
Yuhang Wang
发布于 2023-07-31
推荐镜像 :fff-notebook:v0.2.3
推荐机型 :c16_m62_1 * NVIDIA T4
赞 8
8
FFF: 基于蛋白片段预测的冷冻电镜结构自动搭建
冷冻电镜结构解析技术
冷冻电镜全原子模型结构搭建
FFF 使用案例展示
ASCT2: 一个典型的多构象蛋白
AlphaFold2 预测结构
传统电镜结构搭建方法
使用FFF算法
变量定义
1. 密度图识别
三维片段结构预测
片段预测效果
2. 蛋白全原子结构搭建
初始蛋白结构与处理
格点文件&结构约束文件的生成
结构拟合
3. 预测和已发表结构的比较
最终输出文件
输出结构展示
总结
参考文献

FFF: 基于蛋白片段预测的冷冻电镜结构自动搭建

代码
文本

作者: 陈伟杰,王新颜,王宇航
镜像:fff-notebook:v0.2.3
机型:c16_m62_1 Nvidia T4 (可选择更高配置的显卡)
日期:2023-07-31
Copyright 2023 @ Authors
快速开始:点击上方的 开始连接 按钮(默认使用 fff-notebook:v0.2.3 镜像),稍等片刻即可运行。如您遇到任何问题,请联系 bohrium@dp.tech
提示:运行此notebook 需要用到含有T4显卡的非免费计算资源

代码
文本

冷冻电镜结构解析技术

冷冻电镜(Cryo-electron microscopy,简称Cryo-EM)是一种先进的生物成像技术,近年来已成为生物学领域解析分子结构的重要工具,尤其对于研究大型生物分子复合物和膜蛋白结构具有显著的优势。它的工作原理是通过冷冻生物样品,使生物分子在低温条件下保持原有的天然状态,然后利用高能电子束透射样品,收集透射电子图像,最后通过计算机进行图像处理和三维重建,得到生物分子的高分辨率三维结构。由于冷冻电镜避免了传统X射线晶体学需要制备晶体的繁琐过程,因此受到了广泛的关注。

尽管冷冻电镜技术在生物结构研究中发挥着巨大的作用,但在最后一步全原子模型结构搭建方面仍然存在挑战。首先,冷冻电镜图像本身存在信噪比低的问题,这是由于电子束对样品的辐照损伤和生物分子的多种构象等因素导致的。这使得分辨率受到限制,从而影响了原子模型的精确性。此外,在将冷冻电镜重建的电子密度图转化为全原子模型时,仍需要依赖于先验的生物信息、模板和模型优化等手段。这些方法的准确性和可靠性在很大程度上决定了最终原子模型的质量,但在某些情况下,如缺乏可用模板或未知的新型结构等,这些方法可能会受到限制。

代码
文本

冷冻电镜全原子模型结构搭建

冷冻电镜结构解析的最终目的是通过实验中观测到的图像来解析目标大分子的全原子结构。目前有多种方法用于原子模型的构建,可以分为手动建模和自动建模两大类。这两种方法各自具有不同的优劣之处。

1.手动建模:利用密度图,研究者可以在图形化界面上手动构建生物大分子的原子模型(如使用软件COOT)。这种方法具有较高的自由度,特别是在密度图中细节不明显的情况下,研究者可以根据已知的化学信息和经验进行推断。然而,手动建模的过程非常耗时,且结果受到研究者经验和判断的影响。

2.自动建模:自动建模软件(如Phenix、Rosetta、ARP/wARP, MDFF等)能够根据密度图自动生成原子模型。这种方法具有较高的效率和一致性,减少了人为因素的干扰。然而,在密度图分辨率较低或模型复杂度较高的情况下,自动建模的准确性可能受到限制。

为了解决现有方案中的弊端,我们提出了一种新方法FFF("Fragment-guided Flexible Fitting”)[5],它可以从冷冻电镜实验数据中构建出更准确和完整的蛋白质结构。FFF通过结合蛋白质结构预测和蛋白质结构识别以及柔性拟合算法,实现了更可靠的冷冻电镜结构建模。

代码
文本
[1]
from PIL import Image

def show_image(image_path):
# 打开图片文件
image = Image.open(image_path).convert("RGBA")
new_image = Image.new("RGBA", image.size, "WHITE")
new_image.paste(image, mask=image)
# 显示图片
new_image.convert("RGB").show()
代码
文本
[2]
show_image("/demo/imgs/pipeline.png")
代码
文本

FFF 使用案例展示

接下来我们用一个真实的案例来展示一下FFF算法的效果。

代码
文本

ASCT2: 一个典型的多构象蛋白

ASCT2 是转运蛋白的一种,由于功能需要,这个蛋白有多种稳定的构象。最主要的构象有两种,一种构象是朝细胞内打开的(6RVX[1], 另一个是朝细胞外打开的状态 (7BCQ)[2],。 我们在这个案例里将根据7BCQ密度图搭建第一种构象的全原子模型结构。下面展示的是已发表的该构象的结构[1]

代码
文本
[3]
# Garaeva, A.A., Guskov, A., Slotboom, D.J. et al.
# A one-gate elevator mechanism for the human neutral amino acid transporter ASCT2.
# Nat. Commun. 10, 3427 (2019) CC-BY 4.0 (https://doi.org/10.1038/s41467-019-11363-x)
show_image("/demo/imgs/multi_confs.png")
代码
文本
[4]
show_image("/work/showcase/7BCQ_rcsb.png")
代码
文本

AlphaFold2 预测结构

下面是AlphaFold2算法[3] 预测的结构,呈现朝内打开的结构,和我们的目标结构相差很大。

代码
文本
[5]
%%bash
echo "TM score (AlphaFold)"
TMscore \
/demo/af2_pdb/7BCQ.pdb \
/demo/pdb/7BCQ.pdb > TMscore.af2.log
cat TMscore.af2.log | grep "TM-score ="
TM score (AlphaFold)
TM-score    = 0.5766  (d0= 7.54)
代码
文本
[6]
show_image("/work/showcase/7BCQ_af.png")
代码
文本
[7]
show_image("/work/showcase/af_vs_rcsb.png")
代码
文本

传统电镜结构搭建方法

MDFF (Molecuar Dynamics Flexible Fitting) 是一种冷冻电镜结构搭建的传统方法[4],我们来试一下它在这个案例上的结构搭建效果。

代码
文本

从下面的结果可以看出,MDFF不能很好的搭建出符合密度图的三维原子模型结构,这主要是因为MDFF 在初始结构和目标结构相差很大时候很容易陷入局部最优解。

代码
文本
[8]
show_image("/work/showcase/7BCQ_mdff.png")
代码
文本
[9]
show_image("/work/showcase/mdff_vs_rcsb.png")
代码
文本

使用FFF算法

最后我们来试试用FFF自动搭建一下ASCT2的全原子模型结构。

代码
文本
[10]
show_image("/demo/imgs/networks.png")
代码
文本

变量定义

代码
文本
[11]
from dpemm.minimize_struct import minimize_struct
from dpemm.prep_grid import prep_grid
from dpemm.prep_restr import prep_restr
from dpemm.run_TMD import run_TMD
from dpemm.run_CMD import run_CMD
from dpemm.preprocess.prep_std_map import prep_std_map
from dpemm.preprocess.prep_apix_map import prep_apix_map
from dpemm.utils import basename_no_ext, rm_ext
import os
import shutil
import subprocess
from typing import Optional
代码
文本
[12]
input_pdb = "/demo/af2_pdb/7BCQ.pdb"
input_map = "/demo/map/7BCQ.apix1.mrc"
input_fasta = "/demo/fasta/7BCQ.fasta"
input_infer_weights = "/ckpt/fffw_304000.pt"
input_infer_config = "/ckpt/train_config.json"
output_dir = "/data/fff_demo/output"
output_pdb = f"{output_dir}/7bcq_fff.pdb"
output_dcd = f"{output_dir}/7bcq_fff.dcd"
input_pdb_ref = "/demo/pdb/7BCQ.pdb"
tmd_num_steps = 12000
tmd_update_freq = 1000
cmd_total_steps = 5000
cmd_num_stages = 2
cmd_steps_per_stage = 500
cmd_k = 1e4
cmd_mdff_k = 200.
cmd_temperature = 10.
gpu_device = 0
grid_res = 3.0
debug = False
time_it = False

# 如果不想重复运行已经做完的步骤,把cache_files 设成 True
cache_files = False

os.makedirs(output_dir, exist_ok=True)
代码
文本

1. 密度图识别

代码
文本

我们首先需要将输入密度图转换成标准的密度图(像素大小为1 Å)以保证输入密度图和模型训练时用的密度图在体素大小上是一致的。另外,我们也需要生成一个方差图。

代码
文本
[13]
output_std_map = os.path.join(
output_dir,
"{}_std_map.mrc".format(basename_no_ext(input_map)),
)
output_apix_map = os.path.join(
output_dir,
"{}_apix_map.mrc".format(basename_no_ext(input_map)),
)

if cache_files and os.path.exists(output_apix_map):
print(f"cached: {output_apix_map}")
else:
prep_apix_map(
input_map=input_map,
output_map=output_apix_map,
output_apix=1.0,
debug=debug,
)

if cache_files and os.path.exists(output_std_map):
print(f"cached: {output_std_map}")
else:
prep_std_map(
input_map=output_apix_map,
output_map=output_std_map,
debug=debug,
)
os.listdir(output_dir)
['7BCQ.apix1.ccp4',
 '7BCQ.apix1_apix_map.mrc',
 '7BCQ.apix1_res_3.0.dx',
 '7BCQ.apix1_std_map.mrc',
 '7BCQ_clean.pdb',
 '7BCQ_clean_chain.pdb',
 '7BCQ_clean_clean.pdb',
 '7BCQ_clean_no_hetero.pdb',
 '7BCQ_cmd.dcd',
 '7BCQ_cmd.pdb',
 '7BCQ_cmd.rst',
 '7BCQ_cmd.tmscore.txt',
 '7BCQ_cmd_cmd_config.yml',
 '7BCQ_infer.cif',
 '7BCQ_infer.pdb',
 '7BCQ_infer.txt',
 '7BCQ_infer_backbone.mrc',
 '7BCQ_restr.exb',
 '7BCQ_restr_config.yml',
 '7BCQ_tmd.pdb',
 '7BCQ_tmd.tmscore.txt',
 '7BCQ_tmd_raw.pdb',
 '7BCQ_tmd_tmd.dcd',
 '7BCQ_tmd_tmd.rst',
 '7BCQ_tmd_tmd_config.yml',
 '7bcq_fff.dcd',
 '7bcq_fff.pdb']
代码
文本

三维片段结构预测

我们现在就可以对密度图进行识别并生成若干个蛋白片段。对密度图进行片段识别需要依赖很多信息,包括原子的概率,位置和氨基酸类型,以及伪肽键向量(pseudo peptide vector)。

代码
文本
[14]
output_infer_txt = os.path.join(
output_dir,
"{}_infer.txt".format(basename_no_ext(input_pdb)),
)
output_infer_cif = f"{rm_ext(output_infer_txt)}.cif"
output_infer_pdb = f"{rm_ext(output_infer_txt)}.pdb"
output_backbone_map = f"{rm_ext(output_infer_txt)}_backbone.mrc"

cmd = [
"fff",
"infer",
"--output-txt", output_infer_txt,
"--output-cif", output_infer_cif,
"--output-pdb", output_infer_pdb,
"--input-config", input_infer_config,
"--input-weights", input_infer_weights,
"--input-raw-map", output_apix_map,
"--input-std-map", output_std_map,
"--input-fasta", input_fasta,
"--output-backbone-map", output_backbone_map,
"--confidence", "0.3",
"--length-cutoff", "2",
"--device", str(gpu_device),
]

if cache_files and os.path.exists(output_infer_txt):
print(f"cached: {output_infer_txt}")
else:
subprocess.run(cmd)

cmd_str = " ".join(cmd)
print(cmd_str)

os.listdir(output_dir)
/opt/conda/envs/dpemm/lib/python3.9/site-packages/torch/nn/functional.py:3704: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.
  warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.")
(64, 64, 64) -> (64, 64, 64)
0
32
match domain from given fasta: /demo/fasta/7BCQ.fasta
num_residue: 236
num_domain: 18
domain mean length: 13.11111111111111
/data/fff_demo/output/7BCQ_infer.txt
/data/fff_demo/output/7BCQ_infer.cif
/data/fff_demo/output/7BCQ_infer.pdb
/data/fff_demo/output/7BCQ_infer_backbone.mrc
fff infer --output-txt /data/fff_demo/output/7BCQ_infer.txt --output-cif /data/fff_demo/output/7BCQ_infer.cif --output-pdb /data/fff_demo/output/7BCQ_infer.pdb --input-config /ckpt/train_config.json --input-weights /ckpt/fffw_304000.pt --input-raw-map /data/fff_demo/output/7BCQ.apix1_apix_map.mrc --input-std-map /data/fff_demo/output/7BCQ.apix1_std_map.mrc --input-fasta /demo/fasta/7BCQ.fasta --output-backbone-map /data/fff_demo/output/7BCQ_infer_backbone.mrc --confidence 0.3 --length-cutoff 2 --device 0
['7BCQ.apix1.ccp4',
 '7BCQ.apix1_apix_map.mrc',
 '7BCQ.apix1_res_3.0.dx',
 '7BCQ.apix1_std_map.mrc',
 '7BCQ_clean.pdb',
 '7BCQ_clean_chain.pdb',
 '7BCQ_clean_clean.pdb',
 '7BCQ_clean_no_hetero.pdb',
 '7BCQ_cmd.dcd',
 '7BCQ_cmd.pdb',
 '7BCQ_cmd.rst',
 '7BCQ_cmd.tmscore.txt',
 '7BCQ_cmd_cmd_config.yml',
 '7BCQ_infer.cif',
 '7BCQ_infer.pdb',
 '7BCQ_infer.txt',
 '7BCQ_infer_backbone.mrc',
 '7BCQ_restr.exb',
 '7BCQ_restr_config.yml',
 '7BCQ_tmd.pdb',
 '7BCQ_tmd.tmscore.txt',
 '7BCQ_tmd_raw.pdb',
 '7BCQ_tmd_tmd.dcd',
 '7BCQ_tmd_tmd.rst',
 '7BCQ_tmd_tmd_config.yml',
 '7bcq_fff.dcd',
 '7bcq_fff.pdb']
代码
文本

片段预测效果

下面展示的是预测的片段结构。

代码
文本
[15]
show_image("/work/showcase/7BCQ_domain.png")
代码
文本

展示的是FFF预测的主干密度图(灰)与输入密度图(浅蓝)的对照,主干密度图表示每个voxel属于主干原子(C, C, N)的概率。

后续使用密度图拟合的过程默认采用输入密度图,用户也可以采用主干密度图进行拟合。

代码
文本
[16]
show_image("/work/showcase/raw_vs_bb_map.png")
代码
文本

2. 蛋白全原子结构搭建

有了预测的蛋白片段之后,我们就以这些片段作为结构约束来搭建一个完整的蛋白结构。下面展示是这部分算法的流程图。

代码
文本
[17]
show_image("/demo/imgs/tmdff.png")
代码
文本

初始蛋白结构与处理

我们拿到的蛋白结构文件通常有很多缺失信息(氢原子,某些残基侧链等等)。我们首先需要对输入初始结构进行修复。

代码
文本

从下面的结果可以很清楚的看出FFF搭建的结果和目标结构非常接近,基本可以满足冷冻电镜结构搭建的需求。

代码
文本
[18]
output_pdb_minimized = os.path.join(
output_dir,
"{}_clean.pdb".format(basename_no_ext(input_pdb)),
)

if cache_files and os.path.isfile(output_pdb_minimized):
print("Using cached file: {}".format(output_pdb_minimized))
else:
minimize_struct(
input_pdb=input_pdb,
output_pdb=output_pdb_minimized,
debug=debug,
time_it=time_it,
)
Finding missing atoms...
Adding missing atoms...
Writing output...
Done.
Load PDB... Done.
Re-organize chain id... Done.
Warning: importing 'simtk.openmm' is deprecated.  Import 'openmm' instead.
Finding missing residues...
Finding nonstandard residues...
Replacing nonstandard residues...
Finding missing atoms...
Adding missing atoms...
Adding missing hydrogens...
Writing output...
Done.
Find GB force
Minimization...
Before:
437100.15625 kJ/mol
-20948833.0407383 kJ/(nm mol)
After:
-54571.84375 kJ/mol
-1248.0090580619872 kJ/(nm mol)
Done.
代码
文本

格点文件&结构约束文件的生成

接下来我们需要开准备一下动态模拟所需要的格点文件和结构约束文件。

代码
文本
[19]
input_map_ccp4 = os.path.join(
output_dir,
"{}.ccp4".format(basename_no_ext(input_map)),
)

output_grid = "{}_res_{}.dx".format(
rm_ext(input_map_ccp4),
grid_res,
)

if cache_files and os.path.isfile(output_grid):
print("Using cached file: {}".format(output_grid))
else:
shutil.copy(input_map, input_map_ccp4)
prep_grid(
input_map=input_map_ccp4,
output_grid=output_grid,
res=grid_res,
)
dpems grid --input /data/fff_demo/output/7BCQ.apix1.ccp4 --output /data/fff_demo/output/7BCQ.apix1_res_3.0.dx --rinp 3 --rout 3.0
>> input is a ccp4 file: /data/fff_demo/output/7BCQ.apix1.ccp4
>> origin from /data/fff_demo/output/7BCQ.apix1.ccp4: [55.65999806 86.019997   72.86399746]
GRID SIZE: 65 x 65 x 65
>> output grid origin [55.65999806 86.019997   72.86399746]
代码
文本
[20]
output_restr = os.path.join(
output_dir,
"{}_restr.exb".format(basename_no_ext(input_pdb)),
)

if cache_files and os.path.isfile(output_restr):
print("Using cached file: {}".format(output_restr))
else:
prep_restr(
input_pdb=output_pdb_minimized,
output_restr=output_restr,
)
print("restraint file: {}".format(output_restr))
dpems optstruc --configure /data/fff_demo/output/7BCQ_restr_config.yml
PLATFORM: CUDA
Writing SSrestraint
Writing CHIRALrestraint
Writing CISrestraint
restraint file: /data/fff_demo/output/7BCQ_restr.exb
代码
文本

结构拟合

我们下一步需要做的是将初始结构拟合到预测的片段结构上,同时我们也会用由密度图转化的格点文件来指引结构拟合的全过程。

代码
文本
[21]
output_pdb_tmd = os.path.join(
output_dir,
"{}_tmd.pdb".format(basename_no_ext(input_pdb)),
)

if cache_files and os.path.isfile(output_pdb_tmd):
print("Using cached file: {}".format(output_pdb_tmd))
else:
run_TMD(
input_pdb_init=output_pdb_minimized,
input_pdb_target=output_infer_pdb,
input_restr=output_restr,
output_pdb=output_pdb_tmd,
num_steps=tmd_num_steps,
tmd_update_freq=tmd_update_freq,
gpu_device=gpu_device,
debug=debug,
time_it=time_it,
)

dpems tmd --init-pdb /data/fff_demo/output/7BCQ_clean.pdb --restraint /data/fff_demo/output/7BCQ_restr.exb --coupling-config /data/fff_demo/output/7BCQ_tmd_tmd_config.yml --output-restart /data/fff_demo/output/7BCQ_tmd_tmd.rst --output-dcd /data/fff_demo/output/7BCQ_tmd_tmd.dcd --output-pdb /data/fff_demo/output/7BCQ_tmd_raw.pdb --output-pdb-aligned /data/fff_demo/output/7BCQ_tmd.pdb --temperature 10 --nsteps 12000 --traj-freq 1000 --report-freq 1000 --tmd-update-freq 1000 --platform CUDA
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.

>>> 236 atoms selected for TMD


>>> initial RMSD: 8.48 A


Stage 1: gamma = 0.92

#"Step","Potential Energy (kJ/mole)","Temperature (K)","Density (g/mL)","Speed (ns/day)","Time Remaining"
1000,-55395.15953086443,10.935697480964484,9.625561292924427,0,--
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
[stage 1] >>> save /data/fff_demo/output/7BCQ_tmd_raw.pdb
[stage 1] >>> save /data/fff_demo/output/7BCQ_tmd.pdb
[stage 1] >>> rmsd: 7.80 A (gamma = 0.9166666666666666)

Stage 2: gamma = 0.83

2000,-55288.67541526787,12.075340646822927,9.625561292924427,69.1,0:25
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
[stage 2] >>> save /data/fff_demo/output/7BCQ_tmd_raw.pdb
[stage 2] >>> save /data/fff_demo/output/7BCQ_tmd.pdb
[stage 2] >>> rmsd: 7.13 A (gamma = 0.8333333333333334)

Stage 3: gamma = 0.75

3000,-55083.09274333183,11.651242182053384,9.625561292924427,71.2,0:21
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
[stage 3] >>> save /data/fff_demo/output/7BCQ_tmd_raw.pdb
[stage 3] >>> save /data/fff_demo/output/7BCQ_tmd.pdb
[stage 3] >>> rmsd: 6.46 A (gamma = 0.75)

Stage 4: gamma = 0.67

4000,-54961.81337345173,12.08808337728884,9.625561292924427,72.6,0:19
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
[stage 4] >>> save /data/fff_demo/output/7BCQ_tmd_raw.pdb
[stage 4] >>> save /data/fff_demo/output/7BCQ_tmd.pdb
[stage 4] >>> rmsd: 5.77 A (gamma = 0.6666666666666667)

Stage 5: gamma = 0.58

5000,-54775.03003960011,12.387954444093356,9.625561292924427,72.7,0:16
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
[stage 5] >>> save /data/fff_demo/output/7BCQ_tmd_raw.pdb
[stage 5] >>> save /data/fff_demo/output/7BCQ_tmd.pdb
[stage 5] >>> rmsd: 5.05 A (gamma = 0.5833333333333333)

Stage 6: gamma = 0.50

6000,-54542.08228012238,11.673589617894702,9.625561292924427,73.3,0:14
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
[stage 6] >>> save /data/fff_demo/output/7BCQ_tmd_raw.pdb
[stage 6] >>> save /data/fff_demo/output/7BCQ_tmd.pdb
[stage 6] >>> rmsd: 4.34 A (gamma = 0.5)

Stage 7: gamma = 0.42

7000,-54390.491832383756,12.691338374008142,9.625561292924427,73.3,0:11
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
[stage 7] >>> save /data/fff_demo/output/7BCQ_tmd_raw.pdb
[stage 7] >>> save /data/fff_demo/output/7BCQ_tmd.pdb
[stage 7] >>> rmsd: 3.62 A (gamma = 0.41666666666666663)

Stage 8: gamma = 0.33

8000,-54243.49506762587,12.286542322856695,9.625561292924427,73.4,0:09
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
[stage 8] >>> save /data/fff_demo/output/7BCQ_tmd_raw.pdb
[stage 8] >>> save /data/fff_demo/output/7BCQ_tmd.pdb
[stage 8] >>> rmsd: 2.92 A (gamma = 0.33333333333333337)

Stage 9: gamma = 0.25

9000,-54145.08429337973,12.500200522709907,9.625561292924427,73.6,0:07
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
[stage 9] >>> save /data/fff_demo/output/7BCQ_tmd_raw.pdb
[stage 9] >>> save /data/fff_demo/output/7BCQ_tmd.pdb
[stage 9] >>> rmsd: 2.22 A (gamma = 0.25)

Stage 10: gamma = 0.17

10000,-53847.063096414015,12.377334237655072,9.625561292924427,73.9,0:04
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
[stage 10] >>> save /data/fff_demo/output/7BCQ_tmd_raw.pdb
[stage 10] >>> save /data/fff_demo/output/7BCQ_tmd.pdb
[stage 10] >>> rmsd: 1.62 A (gamma = 0.16666666666666663)

Stage 11: gamma = 0.08

11000,-53416.41181881514,12.860831477990114,9.625561292924427,73.9,0:02
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
[stage 11] >>> save /data/fff_demo/output/7BCQ_tmd_raw.pdb
[stage 11] >>> save /data/fff_demo/output/7BCQ_tmd.pdb
[stage 11] >>> rmsd: 1.01 A (gamma = 0.08333333333333337)

Stage 12: gamma = 0.00

12000,-52730.25720070057,14.061296279725285,9.625561292924427,73.9,0:00
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
[stage 12] >>> save /data/fff_demo/output/7BCQ_tmd_raw.pdb
[stage 12] >>> save /data/fff_demo/output/7BCQ_tmd.pdb
[stage 12] >>> rmsd: 0.57 A (gamma = 0.0)
代码
文本
[22]
output_pdb_cmd = os.path.join(
output_dir,
"{}_cmd.pdb".format(basename_no_ext(input_pdb)),
)

output_dcd_cmd = os.path.join(
output_dir,
"{}_cmd.dcd".format(basename_no_ext(input_pdb)),
)

if cache_files and os.path.isfile(output_pdb_cmd):
print("Using cached file: {}".format(output_pdb_cmd))
else:
cmd_traj_freq = int(cmd_total_steps / 10)
cmd_report_freq = int(cmd_total_steps / 10)
run_CMD(
input_pdb_init=output_pdb_tmd,
input_pdb_target=output_infer_pdb,
input_restr=output_restr,
output_pdb=output_pdb_cmd,
output_dcd=output_dcd_cmd,
total_steps=cmd_total_steps,
traj_freq=cmd_traj_freq,
report_freq=cmd_report_freq,
temperature=cmd_temperature,
cmd_selection='name CA',
cmd_k=cmd_k,
cmd_total_stages=cmd_num_stages,
cmd_steps_per_stage=cmd_steps_per_stage,
mdff_grid=output_grid,
mdff_k=cmd_mdff_k,
mdff_selection="all",
platform="CUDA",
gpu_device=gpu_device,
debug=True,
time_it=True,
)
{'cmd_k': 10000.0,
 'cmd_selection': 'name CA',
 'cmd_steps_per_stage': 500,
 'cmd_total_stages': 2,
 'gpu_device': 0,
 'input_pdb_init': '/data/fff_demo/output/7BCQ_tmd.pdb',
 'input_pdb_target': '/data/fff_demo/output/7BCQ_infer.pdb',
 'input_restr': '/data/fff_demo/output/7BCQ_restr.exb',
 'output_dcd': '/data/fff_demo/output/7BCQ_cmd.dcd',
 'output_pdb': '/data/fff_demo/output/7BCQ_cmd.pdb',
 'output_rst': None,
 'platform': 'CUDA',
 'report_freq': 500,
 'temperature': 10.0,
 'total_steps': 5000,
 'traj_freq': 500}
dpems cmd --input-pdb /data/fff_demo/output/7BCQ_tmd.pdb --coupling-config /data/fff_demo/output/7BCQ_cmd_cmd_config.yml --output-restart /data/fff_demo/output/7BCQ_cmd.rst --output-dcd /data/fff_demo/output/7BCQ_cmd.dcd --output-pdb /data/fff_demo/output/7BCQ_cmd.pdb --temperature 10.0 --total-steps 5000 --traj-freq 500 --report-freq 500 --cmd-total-stages 2 --cmd-steps-per-stage 500 --platform CUDA --debug --restraint /data/fff_demo/output/7BCQ_restr.exb
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.07s.
CREATE BIAS USING MAP: /data/fff_demo/output/7BCQ.apix1_res_3.0.dx
INPUTMAP: 64 x 64 x 64
CREATEMAP: 64 x 64 x 64
>>> MDFF biases: [<openmm.openmm.CustomCompoundBondForce; proxy of <Swig Object of type 'OpenMM::CustomCompoundBondForce *' at 0x7f23b196ec90> >]
>>> All biases: [<openmm.openmm.CustomExternalForce; proxy of <Swig Object of type 'OpenMM::CustomExternalForce *' at 0x7f23b1945630> >, <openmm.openmm.CustomCompoundBondForce; proxy of <Swig Object of type 'OpenMM::CustomCompoundBondForce *' at 0x7f23b196ec90> >]
>>> Add restraints (SS, cis, chiral) using "/data/fff_demo/output/7BCQ_restr.exb"
CMD Stage 1: gamma: 0.500 (236 atoms restrained)
#"Step","Potential Energy (kJ/mole)","Temperature (K)","Density (g/mL)","Speed (ns/day)","Time Remaining"
500,-97090.2734375,11.759734960739115,9.625561292924427,0,--
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
>>> RMSD: 0.72 Å

>>> atom 82: xyz=[ 6.894629   11.00670433  8.39646816] nm; sys: xyz=[ 6.8741 11.0359  8.3503] nm; ref: xyz=[ 6.9458 10.9505  8.478 ] nm; 
CMD Stage 2: gamma: 1.000 (236 atoms restrained)
1000,-95746.671875,13.16433019533872,9.625561292924427,70.8,0:09
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
>>> RMSD: 0.65 Å

>>> atom 82: xyz=[ 6.92476416 10.98426247  8.44348335] nm; sys: xyz=[ 6.8741 11.0359  8.3503] nm; ref: xyz=[ 6.9458 10.9505  8.478 ] nm; 
>>> Run MD with constraints (4000 steps to go; 236 atoms restrained)
1500,-96080.484375,12.115721037961649,9.625561292924427,70.2,0:08
2000,-96188.5859375,10.847192694100114,9.625561292924427,97.9,0:05
2500,-96217.4140625,10.104542208772175,9.625561292924427,122,0:03
3000,-96229.640625,10.122088472905846,9.625561292924427,143,0:02
3500,-96230.3203125,10.066886989533359,9.625561292924427,162,0:01
4000,-96215.984375,9.754862200097236,9.625561292924427,176,0:00
4500,-96229.625,9.944912422566823,9.625561292924427,191,0:00
5000,-96230.578125,9.86418352907686,9.625561292924427,204,0:00
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.07s.
>>> RMSD: 0.65 Å

>>> atom 82: xyz=[ 6.9252367  10.98064423  8.4386816 ] nm; sys: xyz=[ 6.8741 11.0359  8.3503] nm; ref: xyz=[ 6.9458 10.9505  8.478 ] nm; 
Done!
>>> Total number of steps: 5000
>>> output pdb: /data/fff_demo/output/7BCQ_cmd.pdb
Time elapsed: 12.634897708892822 s
代码
文本

3. 预测和已发表结构的比较

最后我们比较一下预测的和已发表的结构相差多大。

代码
文本
[23]
if input_pdb_ref is not None:
print("Intermediate TM Score (after TMD))")
output_tmscore_tmd = os.path.join(
output_dir,
"{}_tmd.tmscore.txt".format(basename_no_ext(input_pdb)),
)
with open(output_tmscore_tmd, "w") as f:
cmd = [
"TMscore",
output_pdb_tmd,
input_pdb_ref,
]
subprocess.run(cmd, stdout=f, check=True, text=True)

subprocess.run(f"cat {output_tmscore_tmd} | grep \"TM-score =\"",
shell=True, check=True, text=True)

print("-----------------")

print("Final TM score (after CMD)")
output_tmscore_cmd = os.path.join(
output_dir,
"{}_cmd.tmscore.txt".format(basename_no_ext(input_pdb)),
)
with open(output_tmscore_cmd, "w") as f:
cmd = [
"TMscore",
output_pdb_cmd,
input_pdb_ref,
]
subprocess.run(cmd, stdout=f, check=True, text=True)

subprocess.run(f"cat {output_tmscore_cmd} | grep \"TM-score =\"",
shell=True, check=True, text=True)
print("-----------------")

Intermediate TM Score (after TMD))
TM-score    = 0.8989  (d0= 7.54)
-----------------
Final TM score (after CMD)
TM-score    = 0.9096  (d0= 7.54)
-----------------
代码
文本

最终输出文件

最后我把中间文件拷贝一份到最终输出文件。

代码
文本
[24]
shutil.copy(output_pdb_cmd, output_pdb)
shutil.copy(output_dcd_cmd, output_dcd)
print(f">>> output pdb: {output_pdb}")
print(f">>> output dcd: {output_dcd}")
>>> output pdb: /data/fff_demo/output/7bcq_fff.pdb
>>> output dcd: /data/fff_demo/output/7bcq_fff.dcd
代码
文本

输出结构展示

下面展示的是FFF预测的结构(黑)与输入密度图的对照,以及与已发表的结构(红)的比较。你会发现结构搭建的效果和密度图的局部质量相关。对与特征弱的区域(比如loop区)由于结构预测/搭建过程中受到的约束较小,所以和已发表的结构相会差偏大。对于密度特征强的区域,自动搭建的结构和已发表结构基本吻合。

代码
文本
[25]
show_image("/work/showcase/7BCQ_FFF.png")
代码
文本
[26]
show_image("/work/showcase/FFF_vs_rcsb.png")
代码
文本

总结

代码
文本

冷冻电镜全原子模型结构搭建的方法虽然很多,但是对于中等分辨率的电镜密度图做准确且自动的结构搭建仍然是个挑战。FFF通过结合计算机视觉领域的三维识别算法和计算模拟领域的分子动态模拟技术实现了对蛋白结构的自动化搭建,而且在精准程度上超过了传统方法以及蛋白结构预测方法。未来FFF算法将被扩展到DNA/RNA/小分子的结构搭建上。此外,我们开发了基于FFF 算法的App (https://app.bohrium.dp.tech/fff), 以方便更多人能将FFF 应用到自己的cryo-EM 数据处理工作流当中。FFF App 的详细使用说明可以参考这个文档

代码
文本

alt FFF_App_snapshot.png

代码
文本

参考文献

  1. Garaeva, A.A., Guskov, A., Slotboom, D.J. et al. A one-gate elevator mechanism for the human neutral amino acid transporter ASCT2. Nat Commun 10, 3427 (2019)
  2. Garibsingh RA, Ndaru E, Garaeva AA, Shi Y, Zielewicz L, Zakrepine P, Bonomi M, Slotboom DJ, Paulino C, Grewer C, Schlessinger A. Rational design of ASCT2 inhibitors using an integrated experimental-computational approach. Proc. Natl. Acad. Sci. (U. S. A.) 118:e2104093118. (2021)
  3. Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021)
  4. Trabuco LG, Villa E, Mitra K, Frank J, Schulten K. Flexible fitting of atomic structures into electron microscopy maps using molecular dynamics. Structure. 16:673-83 (2008)
  5. Weijie Chen, Xinyan Wang, and Yuhang Wang. FFF: Fragment-Guided Flexible Fitting for Building Complete Protein Structures. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023 pp. 19776-19785 (2023)
代码
文本
Deep Learning
cryo-EM
AI4S
Deep Learningcryo-EMAI4S
已赞8
本文被以下合集收录
App related
Charmy Niu
更新于 2024-01-17
10 篇3 人关注
测试合集文章列表100篇
xingyanshi@dp.tech很长的名字xingyanshi@dp.tech很长的名字很长的
更新于 2024-08-04
104 篇2 人关注
推荐阅读
公开
Uni-Mol性质预测实战-回归任务-有机/电解液分子的熔点预测
Uni-MolDeep Learning中文
Uni-MolDeep Learning中文
Letian
发布于 2023-10-31
3 赞4 转存文件
公开
AI4S Cup 学习赛--超导体临界温度预测 tensor
中文notebookAI4S
中文notebookAI4S
Waytogo
发布于 2024-05-20
评论
 from dpemm.minimize_...

Hui_Zhou

10-19 04:25
RuntimeError: this license key is expired
评论
 ## 3. 预测和已发表结构的比较 最后...

镜影映照

2023-08-28
可以增加一个预测结构生成的密度图 vs 初始密度图的对比吗,感觉可以增加说服力。

Yuhang Wang

作者
2023-09-03
行,那就加个密度图对比

Yuhang Wang

作者
2023-09-04
新加了初始密度图和预测的密度图(backbone map)的比较
评论