空间站广场

论文

Notebooks

比赛

课程

Apps

我的主页

我的Notebooks

我的论文库

我的足迹

我的工作空间

任务

节点

文件

数据集

镜像

项目

数据库

公开

FFF | Fragment-Guided Flexible Fitting

Deep Learning

cryo-EM

AI4S

Deep Learningcryo-EMAI4S

Yuhang Wang

发布于 2023-09-06

推荐镜像 :DPEM:0.2.3

推荐机型 :c16_m62_1 * NVIDIA T4

FFF | Fragment-Guided Flexible Fitting for Cryo-EM

Cryo-electron microscopy technology for structural analysis

Cryo-electron microscopy all-atom model structure construction

FFF tutorial

ASCT2: A multi-conformational protein

AlphaFold2 predicted structure

Traditional electron microscope structure construction method

Use FFF algorithm

Variable definitions

1. Density map recognition

3D fragment structure prediction

Fragment prediction effect

2. Protein full atomic structure construction

Initial protein structure and processing

Generation of grid file & structural constraint file

Structure Fitting

3. Comparison of predicted and published structures

Final output file

output structure display

Summary

References

FFF | Fragment-Guided Flexible Fitting for Cryo-EM

代码

文本

Authors： Chen Weijie, Wang Xinyan, Wang Yuhang
Docker image：fff-notebook:v0.2.3
Node type：c16_m62_1 Nvidia T4 (upgradable)
Date：2023-07-31
Copyright 2023 @ Authors
Quick Start: Click the button above to start connecting to (by default, the fff-notebook:v0.2.3 image is used), and it will run after a while. If you encounter any problems, please contact bohrium@dp.tech .
Tip: Running this notebook requires the use of non-free computing resources
containing a T4 graphics card

代码

文本

Cryo-electron microscopy technology for structural analysis

Cryo-electron microscopy (Cryo-EM) is an advanced imaging technology in biology. In recent years, it has become an important tool for analyzing the structure of biomolecules, especially for studying large biomolecular complexes and membrane protein structures. The working principle of Cryo-EM is to freeze the biological sample, so that the biomolecules can maintain their natural state under low temperature conditions. Then, high-energy electron beams are used to transmit the sample, collect transmission electron images, and finally obtain the high-resolution three-dimensional structure of the biomolecules through computer image processing and three-dimensional reconstruction. Because Cryo-EM avoids the tedious process of preparing crystals required by traditional X-ray crystallography, it has attracted widespread attention.

Although Cryo-EM technology plays a huge role in biomedical structure research, there are still challenges in the last step of building the full atomic model structure. Firstly, Cryo-EM images themselves have the problem of low signal-to-noise ratio, which is caused by electron beam radiation damage to the sample, as well as the variety of conformations of biomolecules. This limits the resolution and thus affects the accuracy of the atomic model. In addition, in converting the electron density map reconstructed by Cryo-EM into a full atomic model, it still relies on prior biological information, templates, and model optimization. The accuracy and reliability of these methods largely determine the quality of the final atomic model. However, in some cases, such as a lack of available templates or unknown new structures, these methods may be limited.

代码

文本

Cryo-electron microscopy all-atom model structure construction

The ultimate goal of cryo-EM structure analysis is to analyze the all-atomic structure of the target macromolecule through the images observed in the experiment. At present, there are many methods for the construction of atomic models, which can be divided into two categories: manual modeling and automatic modeling. These two methods each have different advantages and disadvantages.

Manual modeling: Using density maps, researchers can manually construct atomic models of biomacromolecules on a graphical interface (such as using the software COOT). This method has a high degree of freedom, especially when the details of the density map are not obvious, and the researcher can make inferences based on known chemical information and experience. However, the manual modeling process is time-consuming and the results are affected by the researcher's experience and judgment.
Automatic modeling: Automatic modeling software (such as Phenix, Rosetta, ARP/wARP, MDFF, etc.) can automatically generate atomic models based on density maps. This method has high efficiency and consistency, and reduces the interference of human factors. However, the accuracy of automated modeling may be limited in cases where the density map resolution is low or the model complexity is high.

In order to solve the drawbacks in existing solutions, we propose a new method FFF ("Fragment-guided Flexible Fitting") [[5]](https://openaccess.thecvf.com/content/CVPR2023/html/Chen< _>FFF_Fragment-Guided_Flexible_Fitting_for_Building_Complete_Protein_Structures_CVPR_2023<_>paper. html), which can construct more accurate and complete protein structures from cryo-EM experimental data. FFF achieves more reliable cryo-EM structure modeling by combining protein structure prediction and protein structure identification with flexible fitting algorithms.

代码

文本

[1]

from PIL import Image

def show_image(image_path):

image = Image.open(image_path).convert("RGBA")

new_image = Image.new("RGBA", image.size, "WHITE")

new_image.paste(image, mask=image)

new_image.convert("RGB").show()

代码

文本

[2]

show_image("/demo/imgs/pipeline.png")

代码

文本

FFF tutorial

Next, we use a real case to demonstrate the effect of the FFF algorithm.

代码

文本

ASCT2: A multi-conformational protein

ASCT2 is a type of transporter protein. Due to functional requirements, this protein has multiple stable conformations. There are two main conformations. One conformation is open toward the inside of the cell (6RVX) [[1]](https://doi.org /10.1038/s41467-019-11363-x), and the other is open toward the outside of the cell (7BCQ)[[2]](https:/ /doi.org/10.1073/pnas.210409311),. In this case, we will build the all-atom model structure of the first conformation based on the 7BCQ density map. Shown below is the published structure of this conformation [1].

代码

文本

[3]

# Garaeva, A.A., Guskov, A., Slotboom, D.J. et al.

# A one-gate elevator mechanism for the human neutral amino acid transporter ASCT2.

# Nat. Commun. 10, 3427 (2019) CC-BY 4.0 (https://doi.org/10.1038/s41467-019-11363-x)

show_image("/demo/imgs/multi_confs.png")

代码

文本

[4]

show_image("/work/showcase/7BCQ_rcsb.png")

代码

文本

AlphaFold2 predicted structure

The following is the structure predicted by the AlphaFold2 algorithm [3], which shows an inwardly opened structure, which is very different from our target structure.

代码

文本

[5]

%%bash

echo "TM score (AlphaFold)"

TMscore \

/demo/af2_pdb/7BCQ.pdb \

/demo/pdb/7BCQ.pdb > TMscore.af2.log

cat TMscore.af2.log | grep "TM-score ="

TM score (AlphaFold)
TM-score    = 0.5766  (d0= 7.54)

代码

文本

[6]

show_image("/work/showcase/7BCQ_af.png")

代码

文本

[7]

show_image("/work/showcase/af_vs_rcsb.png")

代码

文本

Traditional electron microscope structure construction method

MDFF (Molecuar Dynamics Flexible Fitting) is a traditional method of cryo-EM structure construction[4], let's try it in this case The structural construction effect on the.

代码

文本

It can be seen from the results below that MDFF cannot well build a three-dimensional atomic model structure that conforms to the density map. This is mainly because MDFF can easily fall into a local optimal solution when the initial structure and the target structure are very different.

代码

文本

[8]

show_image("/work/showcase/7BCQ_mdff.png")

代码

文本

[9]

show_image("/work/showcase/mdff_vs_rcsb.png")

代码

文本

Use FFF algorithm

Finally, let's try to use FFF to automatically build the all-atom model structure of ASCT2.

代码

文本

[10]

show_image("/demo/imgs/networks.png")

代码

文本

Variable definitions

代码

文本

[11]

from dpemm.minimize_struct import minimize_struct

from dpemm.prep_grid import prep_grid

from dpemm.prep_restr import prep_restr

from dpemm.run_TMD import run_TMD

from dpemm.run_CMD import run_CMD

from dpemm.preprocess.prep_std_map import prep_std_map

from dpemm.preprocess.prep_apix_map import prep_apix_map

from dpemm.utils import basename_no_ext, rm_ext

import os

import shutil

import subprocess

from typing import Optional

代码

文本

[12]

input_pdb = "/demo/af2_pdb/7BCQ.pdb"

input_map = "/demo/map/7BCQ.apix1.mrc"

input_fasta = "/demo/fasta/7BCQ.fasta"

input_infer_weights = "/ckpt/fffw_304000.pt"

input_infer_config = "/ckpt/train_config.json"

output_dir = "/data/fff_demo/output"

output_pdb = f"{output_dir}/7bcq_fff.pdb"

output_dcd = f"{output_dir}/7bcq_fff.dcd"

input_pdb_ref = "/demo/pdb/7BCQ.pdb"

tmd_num_steps = 12000

tmd_update_freq = 1000

cmd_total_steps = 5000

cmd_num_stages = 2

cmd_steps_per_stage = 500

cmd_k = 1e4

cmd_mdff_k = 200.

cmd_temperature = 10.

gpu_device = 0

grid_res = 3.0

debug = False

time_it = False

# 如果不想重复运行已经做完的步骤，把cache_files 设成 True

cache_files = False

os.makedirs(output_dir, exist_ok=True)

代码

文本

1. Density map recognition

代码

文本

We first need to convert the input density map into a standard density map (pixel size is 1 Å) to ensure that the input density map and the density map used for model training are consistent in voxel size. In addition, we also need to generate a variance plot.

代码

文本

[13]

output_std_map = os.path.join(

output_dir,

"{}_std_map.mrc".format(basename_no_ext(input_map)),

)

output_apix_map = os.path.join(

output_dir,

"{}_apix_map.mrc".format(basename_no_ext(input_map)),

)

if cache_files and os.path.exists(output_apix_map):

print(f"cached: {output_apix_map}")

else:

prep_apix_map(

input_map=input_map,

output_map=output_apix_map,

output_apix=1.0,

debug=debug,

)

if cache_files and os.path.exists(output_std_map):

print(f"cached: {output_std_map}")

else:

prep_std_map(

input_map=output_apix_map,

output_map=output_std_map,

debug=debug,

)

os.listdir(output_dir)

['7BCQ.apix1.ccp4',
 '7BCQ.apix1_apix_map.mrc',
 '7BCQ.apix1_res_3.0.dx',
 '7BCQ.apix1_std_map.mrc',
 '7BCQ_clean.pdb',
 '7BCQ_clean_chain.pdb',
 '7BCQ_clean_clean.pdb',
 '7BCQ_clean_no_hetero.pdb',
 '7BCQ_cmd.dcd',
 '7BCQ_cmd.pdb',
 '7BCQ_cmd.rst',
 '7BCQ_cmd.tmscore.txt',
 '7BCQ_cmd_cmd_config.yml',
 '7BCQ_infer.cif',
 '7BCQ_infer.pdb',
 '7BCQ_infer.txt',
 '7BCQ_infer_backbone.mrc',
 '7BCQ_restr.exb',
 '7BCQ_restr_config.yml',
 '7BCQ_tmd.pdb',
 '7BCQ_tmd.tmscore.txt',
 '7BCQ_tmd_raw.pdb',
 '7BCQ_tmd_tmd.dcd',
 '7BCQ_tmd_tmd.rst',
 '7BCQ_tmd_tmd_config.yml',
 '7bcq_fff.dcd',
 '7bcq_fff.pdb']

代码

文本

3D fragment structure prediction

We can now identify the density map and generate several protein fragments. Fragment identification from the density map relies on a lot of information, including the probability, position and amino acid type of $C_{α}$ atoms, as well as the pseudopeptide vector.

代码

文本

[14]

output_infer_txt = os.path.join(

output_dir,

"{}_infer.txt".format(basename_no_ext(input_pdb)),

)

output_infer_cif = f"{rm_ext(output_infer_txt)}.cif"

output_infer_pdb = f"{rm_ext(output_infer_txt)}.pdb"

output_backbone_map = f"{rm_ext(output_infer_txt)}_backbone.mrc"

cmd = [

"fff",

"infer",

"--output-txt", output_infer_txt,

"--output-cif", output_infer_cif,

"--output-pdb", output_infer_pdb,

"--input-config", input_infer_config,

"--input-weights", input_infer_weights,

"--input-raw-map", output_apix_map,

"--input-std-map", output_std_map,

"--input-fasta", input_fasta,

"--output-backbone-map", output_backbone_map,

"--confidence", "0.3",

"--length-cutoff", "2",

"--device", str(gpu_device),

]

if cache_files and os.path.exists(output_infer_txt):

print(f"cached: {output_infer_txt}")

else:

subprocess.run(cmd)

cmd_str = " ".join(cmd)

print(cmd_str)

os.listdir(output_dir)

/opt/conda/envs/dpemm/lib/python3.9/site-packages/torch/nn/functional.py:3704: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.
  warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.")
(64, 64, 64) -> (64, 64, 64)
0
32
match domain from given fasta: /demo/fasta/7BCQ.fasta
num_residue: 236
num_domain: 18
domain mean length: 13.11111111111111
/data/fff_demo/output/7BCQ_infer.txt
/data/fff_demo/output/7BCQ_infer.cif
/data/fff_demo/output/7BCQ_infer.pdb
/data/fff_demo/output/7BCQ_infer_backbone.mrc
fff infer --output-txt /data/fff_demo/output/7BCQ_infer.txt --output-cif /data/fff_demo/output/7BCQ_infer.cif --output-pdb /data/fff_demo/output/7BCQ_infer.pdb --input-config /ckpt/train_config.json --input-weights /ckpt/fffw_304000.pt --input-raw-map /data/fff_demo/output/7BCQ.apix1_apix_map.mrc --input-std-map /data/fff_demo/output/7BCQ.apix1_std_map.mrc --input-fasta /demo/fasta/7BCQ.fasta --output-backbone-map /data/fff_demo/output/7BCQ_infer_backbone.mrc --confidence 0.3 --length-cutoff 2 --device 0

['7BCQ.apix1.ccp4',
 '7BCQ.apix1_apix_map.mrc',
 '7BCQ.apix1_res_3.0.dx',
 '7BCQ.apix1_std_map.mrc',
 '7BCQ_clean.pdb',
 '7BCQ_clean_chain.pdb',
 '7BCQ_clean_clean.pdb',
 '7BCQ_clean_no_hetero.pdb',
 '7BCQ_cmd.dcd',
 '7BCQ_cmd.pdb',
 '7BCQ_cmd.rst',
 '7BCQ_cmd.tmscore.txt',
 '7BCQ_cmd_cmd_config.yml',
 '7BCQ_infer.cif',
 '7BCQ_infer.pdb',
 '7BCQ_infer.txt',
 '7BCQ_infer_backbone.mrc',
 '7BCQ_restr.exb',
 '7BCQ_restr_config.yml',
 '7BCQ_tmd.pdb',
 '7BCQ_tmd.tmscore.txt',
 '7BCQ_tmd_raw.pdb',
 '7BCQ_tmd_tmd.dcd',
 '7BCQ_tmd_tmd.rst',
 '7BCQ_tmd_tmd_config.yml',
 '7bcq_fff.dcd',
 '7bcq_fff.pdb']

代码

文本

Fragment prediction effect

Shown below is the predicted fragment structure.

代码

文本

[15]

show_image("/work/showcase/7BCQ_domain.png")

代码

文本

Shown is the comparison between the backbone density map (gray) predicted by FFF and the input density map (light blue). The backbone density map represents the probability that each voxel belongs to the backbone atom (C, C $α$ , N).

The subsequent process of using density map fitting uses input density map by default, and users can also use backbone density map for fitting.

代码

文本

[16]

show_image("/work/showcase/raw_vs_bb_map.png")

代码

文本

2. Protein full atomic structure construction

After having the predicted protein fragments, we use these fragments as structural constraints to build a complete protein structure. The flow chart of this part of the algorithm is shown below.

代码

文本

[17]

show_image("/demo/imgs/tmdff.png")

代码

文本

Initial protein structure and processing

The protein structure files we get usually have a lot of missing information (hydrogen atoms, side chains of certain residues, etc.). We first need to repair the input initial structure.

代码

文本

From the results below, it can be clearly seen that the results of FFF construction are very close to the target structure, which can basically meet the needs of cryo-EM structure construction.

代码

文本

[18]

output_pdb_minimized = os.path.join(

output_dir,

"{}_clean.pdb".format(basename_no_ext(input_pdb)),

)

if cache_files and os.path.isfile(output_pdb_minimized):

print("Using cached file: {}".format(output_pdb_minimized))

else:

minimize_struct(

input_pdb=input_pdb,

output_pdb=output_pdb_minimized,

debug=debug,

time_it=time_it,

)

Finding missing atoms...
Adding missing atoms...
Writing output...
Done.
Load PDB... Done.
Re-organize chain id... Done.
Warning: importing 'simtk.openmm' is deprecated.  Import 'openmm' instead.
Finding missing residues...
Finding nonstandard residues...
Replacing nonstandard residues...
Finding missing atoms...
Adding missing atoms...
Adding missing hydrogens...
Writing output...
Done.
Find GB force
Minimization...
Before:
437100.15625 kJ/mol
-20948833.0407383 kJ/(nm mol)
After:
-54571.84375 kJ/mol
-1248.0090580619872 kJ/(nm mol)
Done.

代码

文本

Generation of grid file & structural constraint file

Next we need to prepare the grid files and structural constraint files required for dynamic simulation.

代码

文本

[19]

input_map_ccp4 = os.path.join(

output_dir,

"{}.ccp4".format(basename_no_ext(input_map)),

)

output_grid = "{}_res_{}.dx".format(

rm_ext(input_map_ccp4),

grid_res,

)

if cache_files and os.path.isfile(output_grid):

print("Using cached file: {}".format(output_grid))

else:

shutil.copy(input_map, input_map_ccp4)

prep_grid(

input_map=input_map_ccp4,

output_grid=output_grid,

res=grid_res,

)

dpems grid --input /data/fff_demo/output/7BCQ.apix1.ccp4 --output /data/fff_demo/output/7BCQ.apix1_res_3.0.dx --rinp 3 --rout 3.0
>> input is a ccp4 file: /data/fff_demo/output/7BCQ.apix1.ccp4
>> origin from /data/fff_demo/output/7BCQ.apix1.ccp4: [55.65999806 86.019997   72.86399746]
GRID SIZE: 65 x 65 x 65
>> output grid origin [55.65999806 86.019997   72.86399746]

代码

文本

[20]

output_restr = os.path.join(

output_dir,

"{}_restr.exb".format(basename_no_ext(input_pdb)),

)

if cache_files and os.path.isfile(output_restr):

print("Using cached file: {}".format(output_restr))

else:

prep_restr(

input_pdb=output_pdb_minimized,

output_restr=output_restr,

)

print("restraint file: {}".format(output_restr))

dpems optstruc --configure /data/fff_demo/output/7BCQ_restr_config.yml
PLATFORM: CUDA
Writing SSrestraint
Writing CHIRALrestraint
Writing CISrestraint
restraint file: /data/fff_demo/output/7BCQ_restr.exb

代码

文本

Structure Fitting

The next step we need to do is to fit the initial structure to the predicted fragment structure. At the same time, we will also use the grid file converted from the density map to guide the entire process of structure fitting.

代码

文本

[21]

output_pdb_tmd = os.path.join(

output_dir,

"{}_tmd.pdb".format(basename_no_ext(input_pdb)),

)

if cache_files and os.path.isfile(output_pdb_tmd):

print("Using cached file: {}".format(output_pdb_tmd))

else:

run_TMD(

input_pdb_init=output_pdb_minimized,

input_pdb_target=output_infer_pdb,

input_restr=output_restr,

output_pdb=output_pdb_tmd,

num_steps=tmd_num_steps,

tmd_update_freq=tmd_update_freq,

gpu_device=gpu_device,

debug=debug,

time_it=time_it,

)

dpems tmd --init-pdb /data/fff_demo/output/7BCQ_clean.pdb --restraint /data/fff_demo/output/7BCQ_restr.exb --coupling-config /data/fff_demo/output/7BCQ_tmd_tmd_config.yml --output-restart /data/fff_demo/output/7BCQ_tmd_tmd.rst --output-dcd /data/fff_demo/output/7BCQ_tmd_tmd.dcd --output-pdb /data/fff_demo/output/7BCQ_tmd_raw.pdb --output-pdb-aligned /data/fff_demo/output/7BCQ_tmd.pdb --temperature 10 --nsteps 12000 --traj-freq 1000 --report-freq 1000 --tmd-update-freq 1000 --platform CUDA
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.

>>> 236 atoms selected for TMD


>>> initial RMSD: 8.48 A


Stage 1: gamma = 0.92

#"Step","Potential Energy (kJ/mole)","Temperature (K)","Density (g/mL)","Speed (ns/day)","Time Remaining"
1000,-55395.15953086443,10.935697480964484,9.625561292924427,0,--
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
[stage 1] >>> save /data/fff_demo/output/7BCQ_tmd_raw.pdb
[stage 1] >>> save /data/fff_demo/output/7BCQ_tmd.pdb
[stage 1] >>> rmsd: 7.80 A (gamma = 0.9166666666666666)

Stage 2: gamma = 0.83

2000,-55288.67541526787,12.075340646822927,9.625561292924427,69.1,0:25
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
[stage 2] >>> save /data/fff_demo/output/7BCQ_tmd_raw.pdb
[stage 2] >>> save /data/fff_demo/output/7BCQ_tmd.pdb
[stage 2] >>> rmsd: 7.13 A (gamma = 0.8333333333333334)

Stage 3: gamma = 0.75

3000,-55083.09274333183,11.651242182053384,9.625561292924427,71.2,0:21
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
[stage 3] >>> save /data/fff_demo/output/7BCQ_tmd_raw.pdb
[stage 3] >>> save /data/fff_demo/output/7BCQ_tmd.pdb
[stage 3] >>> rmsd: 6.46 A (gamma = 0.75)

Stage 4: gamma = 0.67

4000,-54961.81337345173,12.08808337728884,9.625561292924427,72.6,0:19
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
[stage 4] >>> save /data/fff_demo/output/7BCQ_tmd_raw.pdb
[stage 4] >>> save /data/fff_demo/output/7BCQ_tmd.pdb
[stage 4] >>> rmsd: 5.77 A (gamma = 0.6666666666666667)

Stage 5: gamma = 0.58

5000,-54775.03003960011,12.387954444093356,9.625561292924427,72.7,0:16
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
[stage 5] >>> save /data/fff_demo/output/7BCQ_tmd_raw.pdb
[stage 5] >>> save /data/fff_demo/output/7BCQ_tmd.pdb
[stage 5] >>> rmsd: 5.05 A (gamma = 0.5833333333333333)

Stage 6: gamma = 0.50

6000,-54542.08228012238,11.673589617894702,9.625561292924427,73.3,0:14
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
[stage 6] >>> save /data/fff_demo/output/7BCQ_tmd_raw.pdb
[stage 6] >>> save /data/fff_demo/output/7BCQ_tmd.pdb
[stage 6] >>> rmsd: 4.34 A (gamma = 0.5)

Stage 7: gamma = 0.42

7000,-54390.491832383756,12.691338374008142,9.625561292924427,73.3,0:11
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
[stage 7] >>> save /data/fff_demo/output/7BCQ_tmd_raw.pdb
[stage 7] >>> save /data/fff_demo/output/7BCQ_tmd.pdb
[stage 7] >>> rmsd: 3.62 A (gamma = 0.41666666666666663)

Stage 8: gamma = 0.33

8000,-54243.49506762587,12.286542322856695,9.625561292924427,73.4,0:09
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
[stage 8] >>> save /data/fff_demo/output/7BCQ_tmd_raw.pdb
[stage 8] >>> save /data/fff_demo/output/7BCQ_tmd.pdb
[stage 8] >>> rmsd: 2.92 A (gamma = 0.33333333333333337)

Stage 9: gamma = 0.25

9000,-54145.08429337973,12.500200522709907,9.625561292924427,73.6,0:07
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
[stage 9] >>> save /data/fff_demo/output/7BCQ_tmd_raw.pdb
[stage 9] >>> save /data/fff_demo/output/7BCQ_tmd.pdb
[stage 9] >>> rmsd: 2.22 A (gamma = 0.25)

Stage 10: gamma = 0.17

10000,-53847.063096414015,12.377334237655072,9.625561292924427,73.9,0:04
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
[stage 10] >>> save /data/fff_demo/output/7BCQ_tmd_raw.pdb
[stage 10] >>> save /data/fff_demo/output/7BCQ_tmd.pdb
[stage 10] >>> rmsd: 1.62 A (gamma = 0.16666666666666663)

Stage 11: gamma = 0.08

11000,-53416.41181881514,12.860831477990114,9.625561292924427,73.9,0:02
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
[stage 11] >>> save /data/fff_demo/output/7BCQ_tmd_raw.pdb
[stage 11] >>> save /data/fff_demo/output/7BCQ_tmd.pdb
[stage 11] >>> rmsd: 1.01 A (gamma = 0.08333333333333337)

Stage 12: gamma = 0.00

12000,-52730.25720070057,14.061296279725285,9.625561292924427,73.9,0:00
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
[stage 12] >>> save /data/fff_demo/output/7BCQ_tmd_raw.pdb
[stage 12] >>> save /data/fff_demo/output/7BCQ_tmd.pdb
[stage 12] >>> rmsd: 0.57 A (gamma = 0.0)

代码

文本

[22]

output_pdb_cmd = os.path.join(

output_dir,

"{}_cmd.pdb".format(basename_no_ext(input_pdb)),

)

output_dcd_cmd = os.path.join(

output_dir,

"{}_cmd.dcd".format(basename_no_ext(input_pdb)),

)

if cache_files and os.path.isfile(output_pdb_cmd):

print("Using cached file: {}".format(output_pdb_cmd))

else:

cmd_traj_freq = int(cmd_total_steps / 10)

cmd_report_freq = int(cmd_total_steps / 10)

run_CMD(

input_pdb_init=output_pdb_tmd,

input_pdb_target=output_infer_pdb,

input_restr=output_restr,

output_pdb=output_pdb_cmd,

output_dcd=output_dcd_cmd,

total_steps=cmd_total_steps,

traj_freq=cmd_traj_freq,

report_freq=cmd_report_freq,

temperature=cmd_temperature,

cmd_selection='name CA',

cmd_k=cmd_k,

cmd_total_stages=cmd_num_stages,

cmd_steps_per_stage=cmd_steps_per_stage,

mdff_grid=output_grid,

mdff_k=cmd_mdff_k,

mdff_selection="all",

platform="CUDA",

gpu_device=gpu_device,

debug=True,

time_it=True,

)

{'cmd_k': 10000.0,
 'cmd_selection': 'name CA',
 'cmd_steps_per_stage': 500,
 'cmd_total_stages': 2,
 'gpu_device': 0,
 'input_pdb_init': '/data/fff_demo/output/7BCQ_tmd.pdb',
 'input_pdb_target': '/data/fff_demo/output/7BCQ_infer.pdb',
 'input_restr': '/data/fff_demo/output/7BCQ_restr.exb',
 'output_dcd': '/data/fff_demo/output/7BCQ_cmd.dcd',
 'output_pdb': '/data/fff_demo/output/7BCQ_cmd.pdb',
 'output_rst': None,
 'platform': 'CUDA',
 'report_freq': 500,
 'temperature': 10.0,
 'total_steps': 5000,
 'traj_freq': 500}
dpems cmd --input-pdb /data/fff_demo/output/7BCQ_tmd.pdb --coupling-config /data/fff_demo/output/7BCQ_cmd_cmd_config.yml --output-restart /data/fff_demo/output/7BCQ_cmd.rst --output-dcd /data/fff_demo/output/7BCQ_cmd.dcd --output-pdb /data/fff_demo/output/7BCQ_cmd.pdb --temperature 10.0 --total-steps 5000 --traj-freq 500 --report-freq 500 --cmd-total-stages 2 --cmd-steps-per-stage 500 --platform CUDA --debug --restraint /data/fff_demo/output/7BCQ_restr.exb
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.07s.
CREATE BIAS USING MAP: /data/fff_demo/output/7BCQ.apix1_res_3.0.dx
INPUTMAP: 64 x 64 x 64
CREATEMAP: 64 x 64 x 64
>>> MDFF biases: [<openmm.openmm.CustomCompoundBondForce; proxy of <Swig Object of type 'OpenMM::CustomCompoundBondForce *' at 0x7f23b196ec90> >]
>>> All biases: [<openmm.openmm.CustomExternalForce; proxy of <Swig Object of type 'OpenMM::CustomExternalForce *' at 0x7f23b1945630> >, <openmm.openmm.CustomCompoundBondForce; proxy of <Swig Object of type 'OpenMM::CustomCompoundBondForce *' at 0x7f23b196ec90> >]
>>> Add restraints (SS, cis, chiral) using "/data/fff_demo/output/7BCQ_restr.exb"
CMD Stage 1: gamma: 0.500 (236 atoms restrained)
#"Step","Potential Energy (kJ/mole)","Temperature (K)","Density (g/mL)","Speed (ns/day)","Time Remaining"
500,-97090.2734375,11.759734960739115,9.625561292924427,0,--
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
>>> RMSD: 0.72 Å

>>> atom 82: xyz=[ 6.894629   11.00670433  8.39646816] nm; sys: xyz=[ 6.8741 11.0359  8.3503] nm; ref: xyz=[ 6.9458 10.9505  8.478 ] nm; 
CMD Stage 2: gamma: 1.000 (236 atoms restrained)
1000,-95746.671875,13.16433019533872,9.625561292924427,70.8,0:09
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.06s.
>>> RMSD: 0.65 Å

>>> atom 82: xyz=[ 6.92476416 10.98426247  8.44348335] nm; sys: xyz=[ 6.8741 11.0359  8.3503] nm; ref: xyz=[ 6.9458 10.9505  8.478 ] nm; 
>>> Run MD with constraints (4000 steps to go; 236 atoms restrained)
1500,-96080.484375,12.115721037961649,9.625561292924427,70.2,0:08
2000,-96188.5859375,10.847192694100114,9.625561292924427,97.9,0:05
2500,-96217.4140625,10.104542208772175,9.625561292924427,122,0:03
3000,-96229.640625,10.122088472905846,9.625561292924427,143,0:02
3500,-96230.3203125,10.066886989533359,9.625561292924427,162,0:01
4000,-96215.984375,9.754862200097236,9.625561292924427,176,0:00
4500,-96229.625,9.944912422566823,9.625561292924427,191,0:00
5000,-96230.578125,9.86418352907686,9.625561292924427,204,0:00
@> 236 atoms and 1 coordinate set(s) were parsed in 0.00s.
@> 6701 atoms and 1 coordinate set(s) were parsed in 0.07s.
>>> RMSD: 0.65 Å

>>> atom 82: xyz=[ 6.9252367  10.98064423  8.4386816 ] nm; sys: xyz=[ 6.8741 11.0359  8.3503] nm; ref: xyz=[ 6.9458 10.9505  8.478 ] nm; 
Done!
>>> Total number of steps: 5000
>>> output pdb: /data/fff_demo/output/7BCQ_cmd.pdb
Time elapsed: 12.634897708892822 s

代码

文本

3. Comparison of predicted and published structures

Finally we compare how far the predicted and published structures differ.

代码

文本

[23]

if input_pdb_ref is not None:

print("Intermediate TM Score (after TMD))")

output_tmscore_tmd = os.path.join(

output_dir,

"{}_tmd.tmscore.txt".format(basename_no_ext(input_pdb)),

)

with open(output_tmscore_tmd, "w") as f:

cmd = [

"TMscore",

output_pdb_tmd,

input_pdb_ref,

]

subprocess.run(cmd, stdout=f, check=True, text=True)

subprocess.run(f"cat {output_tmscore_tmd} | grep \"TM-score =\"",

shell=True, check=True, text=True)

print("-----------------")

print("Final TM score (after CMD)")

output_tmscore_cmd = os.path.join(

output_dir,

"{}_cmd.tmscore.txt".format(basename_no_ext(input_pdb)),

)

with open(output_tmscore_cmd, "w") as f:

cmd = [

"TMscore",

output_pdb_cmd,

input_pdb_ref,

]

subprocess.run(cmd, stdout=f, check=True, text=True)

subprocess.run(f"cat {output_tmscore_cmd} | grep \"TM-score =\"",

shell=True, check=True, text=True)

print("-----------------")

Intermediate TM Score (after TMD))
TM-score    = 0.8989  (d0= 7.54)
-----------------
Final TM score (after CMD)
TM-score    = 0.9096  (d0= 7.54)
-----------------

代码

文本

Final output file

Finally I copy the intermediate file to the final output file.

代码

文本

[24]

shutil.copy(output_pdb_cmd, output_pdb)

shutil.copy(output_dcd_cmd, output_dcd)

print(f">>> output pdb: {output_pdb}")

print(f">>> output dcd: {output_dcd}")

>>> output pdb: /data/fff_demo/output/7bcq_fff.pdb
>>> output dcd: /data/fff_demo/output/7bcq_fff.dcd

代码

文本

output structure display

Shown below is the structure predicted by FFF (black) against the input density map, and compared to published structures (red). You will find that the effect of structure building is related to the local quality of the density map. For regions with weak characteristics (such as loop regions), the constraints on the structure prediction/construction process are smaller, so the difference with the published structure will be larger. For areas with strong density features, the automatically constructed structure basically agrees with the published structure.

代码

文本

[25]

show_image("/work/showcase/7BCQ_FFF.png")

代码

文本

[26]

show_image("/work/showcase/FFF_vs_rcsb.png")

代码

文本

Summary

代码

文本

Although there are many methods for constructing the structure of the all-atom model of cryo-EM, it is still a challenge to accurately and automatically construct the structure of the medium-resolution electron microscope density map. FFF realizes the automatic construction of protein structure by combining the three-dimensional recognition algorithm in the field of computer vision and the molecular dynamic simulation technology in the field of computational simulation, and its accuracy exceeds that of traditional methods and protein structure prediction methods. In the future, the FFF algorithm will be expanded to the structure construction of DNA/RNA/small molecules. In addition, we have developed an App (https://app.bohrium.dp.tech/fff) based on the FFF algorithm, so that more people can apply FFF to their cryo-EM data processing workflow.

代码

文本

References

Garaeva, A.A., Guskov, A., Slotboom, D.J. et al. A one-gate elevator mechanism for the human neutral amino acid transporter ASCT2. Nat Commun 10, 3427 (2019)
Garibsingh RA, Ndaru E, Garaeva AA, Shi Y, Zielewicz L, Zakrepine P, Bonomi M, Slotboom DJ, Paulino C, Grewer C, Schlessinger A. Rational design of ASCT2 inhibitors using an integrated experimental-computational approach. Proc. Natl. Acad. Sci. (U. S. A.) 118:e2104093118. (2021)
Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021)
Trabuco LG, Villa E, Mitra K, Frank J, Schulten K. Flexible fitting of atomic structures into electron microscopy maps using molecular dynamics. Structure. 16:673-83 (2008)
Weijie Chen, Xinyan Wang, and Yuhang Wang. FFF: Fragment-Guided Flexible Fitting for Building Complete Protein Structures. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023 pp. 19776-19785 (2023)

代码

文本

Deep Learning

cryo-EM

AI4S

Deep Learningcryo-EMAI4S

已赞2