探究

实验室

计算

公开

AlphaFold2使用基础流程

python

structure prediction

Deep Learning

pythonstructure predictionDeep Learning

dingshizhi

更新于 2025-04-09

推荐镜像 :structure-pred-af:version_1

推荐机型 :c12_m92_1 * NVIDIA V100

1. 环境准备

2. 设置参数

3. 准备输入FASTA文件

4. 运行AlphaFold2

5. 查看排名信息

6. 可视化结果

7. 分析预测质量

8. 查看时间信息

参考资料

此notebook演示了使用了alphafold的基本流程并对结果进行了可视化，为节省时间，使用了预先计算的MSA。

代码

文本

1. 环境准备

代码

文本

[1]

import os

import sys

import subprocess

from pathlib import Path

import matplotlib.pyplot as plt

import numpy as np

import py3Dmol

from IPython.display import display, HTML

代码

文本

2. 设置参数

设置AlphaFold2运行所需的各种参数。

代码

文本

[2]

# 设置路径

ALPHAFOLD_PATH = "/opt/alphafold_project/alphafold-2.3.1" # AlphaFold安装路径

DATABASE_DIR = "/share/structure_prediction/af2_database/" # 数据库路径

OUTPUT_DIR = "/opt/alphafold_project/alphafold-2.3.1/precomputed_test/" # 输出路径

FASTA_PATH = "/opt/alphafold_project/alphafold-2.3.1/example/query.fasta" # FASTA文件路径

MAX_TEMPLATE_DATE = "2020-05-14" # 模板截止日期

# 实际预测过程中，MSA会耗费大量时间，因此此步骤中使用预计算的MSAs

USE_PRECOMPUTED_MSAS = "true" # 是否使用预计算的MSAs

代码

文本

3. 准备输入FASTA文件

查看输入的FASTA文件内容。

代码

文本

[3]

# 显示FASTA文件内容

with open(os.path.join(ALPHAFOLD_PATH, FASTA_PATH), 'r') as f:

fasta_content = f.read()

print(fasta_content)

>dummy_sequence
GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE

代码

文本

4. 运行AlphaFold2

代码

文本

[4]

# 确保输出目录存在

# 运行时间大概需要10～15min

os.makedirs(OUTPUT_DIR, exist_ok=True)

# 构建AlphaFold2运行命令

cmd = f"cd {ALPHAFOLD_PATH} && bash run_alphafold.sh \

-d {DATABASE_DIR} \

-o {os.path.abspath(OUTPUT_DIR)} \

-f {os.path.join(ALPHAFOLD_PATH, FASTA_PATH)} \

-t {MAX_TEMPLATE_DATE} \

-p {USE_PRECOMPUTED_MSAS}"

print(f"运行命令: {cmd}")

# 运行命令

!{cmd}

运行命令: cd /opt/alphafold_project/alphafold-2.3.1 && bash run_alphafold.sh        -d /share/structure_prediction/af2_database/        -o /opt/alphafold_project/alphafold-2.3.1/precomputed_test        -f /opt/alphafold_project/alphafold-2.3.1/example/query.fasta        -t 2020-05-14        -p true
I0409 16:49:26.580164 139974862964544 templates.py:857] Using precomputed obsolete pdbs /share/structure_prediction/af2_database//pdb_mmcif/obsolete.dat.
I0409 16:49:30.901414 139974862964544 xla_bridge.py:353] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker: 
I0409 16:49:31.044875 139974862964544 xla_bridge.py:353] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Host Interpreter CUDA
I0409 16:49:31.045314 139974862964544 xla_bridge.py:353] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client'
I0409 16:49:31.045428 139974862964544 xla_bridge.py:353] Unable to initialize backend 'plugin': xla_extension has no attributes named get_plugin_device_client. Compile TensorFlow with //tensorflow/compiler/xla/python:enable_plugin_device set to true (defaults to false) to enable this.
I0409 16:49:55.518093 139974862964544 run_alphafold.py:386] Have 5 models: ['model_1_pred_0', 'model_2_pred_0', 'model_3_pred_0', 'model_4_pred_0', 'model_5_pred_0']
I0409 16:49:55.518320 139974862964544 run_alphafold.py:403] Using random seed 442039092369751438 for the data pipeline
I0409 16:49:55.518614 139974862964544 run_alphafold.py:161] Predicting query
W0409 16:49:55.518854 139974862964544 pipeline.py:100] Reading MSA from file /opt/alphafold_project/alphafold-2.3.1/precomputed_test/query/msas/uniref90_hits.sto
W0409 16:49:55.519006 139974862964544 pipeline.py:100] Reading MSA from file /opt/alphafold_project/alphafold-2.3.1/precomputed_test/query/msas/mgnify_hits.sto
I0409 16:49:55.519525 139974862964544 hhsearch.py:85] Launching subprocess "/opt/mamba/bin/hhsearch -i /tmp/tmpk141pi3w/query.a3m -o /tmp/tmpk141pi3w/output.hhr -maxseq 1000000 -d /share/structure_prediction/af2_database//pdb70/pdb70"
I0409 16:49:55.608169 139974862964544 utils.py:36] Started HHsearch query
I0409 16:53:04.004565 139974862964544 utils.py:40] Finished HHsearch query in 188.396 seconds
W0409 16:53:04.031605 139974862964544 pipeline.py:100] Reading MSA from file /opt/alphafold_project/alphafold-2.3.1/precomputed_test/query/msas/bfd_uniref_hits.a3m
I0409 16:53:04.031926 139974862964544 templates.py:878] Searching for template for: GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE
I0409 16:53:04.032404 139974862964544 templates.py:718] hit 6mrr_A did not pass prefilter: Template is an exact subsequence of query with large coverage. Length ratio: 1.0.
I0409 16:53:04.032500 139974862964544 templates.py:912] Skipped invalid hit 6MRR_A foldit1; De novo protein, Foldit; 1.18A {synthetic construct}, error: None, warning: None
I0409 16:53:04.526734 139974862964544 templates.py:267] Found an exact template match 6q64_A.
I0409 16:53:05.057441 139974862964544 templates.py:267] Found an exact template match 4s3k_A.
I0409 16:53:05.531115 139974862964544 templates.py:267] Found an exact template match 5jh8_A.
I0409 16:53:05.828265 139974862964544 templates.py:267] Found an exact template match 1jnd_A.
I0409 16:53:06.471878 139974862964544 templates.py:267] Found an exact template match 5y2a_B.
I0409 16:53:08.018202 139974862964544 templates.py:267] Found an exact template match 4wiw_B.
I0409 16:53:08.031688 139974862964544 templates.py:267] Found an exact template match 4wiw_D.
I0409 16:53:08.543946 139974862964544 templates.py:267] Found an exact template match 6jm7_A.
I0409 16:53:08.772002 139974862964544 templates.py:267] Found an exact template match 6jmb_A.
I0409 16:53:09.166378 139974862964544 templates.py:267] Found an exact template match 4q6t_A.
I0409 16:53:09.689095 139974862964544 templates.py:267] Found an exact template match 3oa5_B.
I0409 16:53:10.124649 139974862964544 templates.py:267] Found an exact template match 5y2c_A.
I0409 16:53:10.233451 139974862964544 templates.py:267] Found an exact template match 5cuk_A.
I0409 16:53:11.684303 139974862964544 templates.py:267] Found an exact template match 4a5q_E.
I0409 16:53:11.934331 139974862964544 templates.py:267] Found an exact template match 5y2b_A.
I0409 16:53:12.207329 139974862964544 templates.py:267] Found an exact template match 4lgx_A.
I0409 16:53:12.808989 139974862964544 templates.py:267] Found an exact template match 4w5u_A.
I0409 16:53:13.496593 139974862964544 templates.py:267] Found an exact template match 6jav_A.
I0409 16:53:13.854119 139974862964544 templates.py:267] Found an exact template match 3cz8_A.
I0409 16:53:13.866688 139974862964544 templates.py:267] Found an exact template match 3cz8_B.
I0409 16:53:13.879991 139974862964544 pipeline.py:234] Uniref90 MSA size: 2 sequences.
I0409 16:53:13.880134 139974862964544 pipeline.py:235] BFD MSA size: 1 sequences.
I0409 16:53:13.880180 139974862964544 pipeline.py:236] MGnify MSA size: 2 sequences.
I0409 16:53:13.880219 139974862964544 pipeline.py:237] Final (deduplicated) MSA size: 2 sequences.
I0409 16:53:13.880470 139974862964544 pipeline.py:239] Total number of templates (NB: this can include bad templates and is later filtered to top 4): 20.
I0409 16:53:13.882184 139974862964544 run_alphafold.py:191] Running model model_1_pred_0 on query
I0409 16:53:17.171739 139974862964544 model.py:165] Running predict with shape(feat) = {'aatype': (4, 68), 'residue_index': (4, 68), 'seq_length': (4,), 'template_aatype': (4, 4, 68), 'template_all_atom_masks': (4, 4, 68, 37), 'template_all_atom_positions': (4, 4, 68, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 68), 'msa_mask': (4, 508, 68), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 68, 3), 'template_pseudo_beta_mask': (4, 4, 68), 'atom14_atom_exists': (4, 68, 14), 'residx_atom14_to_atom37': (4, 68, 14), 'residx_atom37_to_atom14': (4, 68, 37), 'atom37_atom_exists': (4, 68, 37), 'extra_msa': (4, 5120, 68), 'extra_msa_mask': (4, 5120, 68), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 68), 'true_msa': (4, 508, 68), 'extra_has_deletion': (4, 5120, 68), 'extra_deletion_value': (4, 5120, 68), 'msa_feat': (4, 508, 68, 49), 'target_feat': (4, 68, 22)}
I0409 16:55:59.699222 139974862964544 model.py:175] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (68, 68, 64)}, 'experimentally_resolved': {'logits': (68, 37)}, 'masked_msa': {'logits': (508, 68, 23)}, 'predicted_lddt': {'logits': (68, 50)}, 'structure_module': {'final_atom_mask': (68, 37), 'final_atom_positions': (68, 37, 3)}, 'plddt': (68,), 'ranking_confidence': ()}
I0409 16:55:59.699431 139974862964544 run_alphafold.py:203] Total JAX model model_1_pred_0 on query predict time (includes compilation time, see --benchmark): 162.5s
I0409 16:56:05.644319 139974862964544 amber_minimize.py:178] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 67 (GLU) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 16:56:05.726140 139974862964544 amber_minimize.py:408] Minimizing protein, attempt 1 of 100.
I0409 16:56:05.904006 139974862964544 amber_minimize.py:69] Restraining 574 / 1170 particles.
I0409 16:56:07.666443 139974862964544 amber_minimize.py:178] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 16:56:10.182704 139974862964544 amber_minimize.py:500] Iteration completed: Einit 801.10 Efinal -2145.04 Time 0.98 s num residue violations 0 num residue exclusions 0 
I0409 16:56:10.258744 139974862964544 run_alphafold.py:191] Running model model_2_pred_0 on query
I0409 16:56:12.154409 139974862964544 model.py:165] Running predict with shape(feat) = {'aatype': (4, 68), 'residue_index': (4, 68), 'seq_length': (4,), 'template_aatype': (4, 4, 68), 'template_all_atom_masks': (4, 4, 68, 37), 'template_all_atom_positions': (4, 4, 68, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 68), 'msa_mask': (4, 508, 68), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 68, 3), 'template_pseudo_beta_mask': (4, 4, 68), 'atom14_atom_exists': (4, 68, 14), 'residx_atom14_to_atom37': (4, 68, 14), 'residx_atom37_to_atom14': (4, 68, 37), 'atom37_atom_exists': (4, 68, 37), 'extra_msa': (4, 1024, 68), 'extra_msa_mask': (4, 1024, 68), 'extra_msa_row_mask': (4, 1024), 'bert_mask': (4, 508, 68), 'true_msa': (4, 508, 68), 'extra_has_deletion': (4, 1024, 68), 'extra_deletion_value': (4, 1024, 68), 'msa_feat': (4, 508, 68, 49), 'target_feat': (4, 68, 22)}
I0409 16:58:54.054380 139974862964544 model.py:175] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (68, 68, 64)}, 'experimentally_resolved': {'logits': (68, 37)}, 'masked_msa': {'logits': (508, 68, 23)}, 'predicted_lddt': {'logits': (68, 50)}, 'structure_module': {'final_atom_mask': (68, 37), 'final_atom_positions': (68, 37, 3)}, 'plddt': (68,), 'ranking_confidence': ()}
I0409 16:58:54.054674 139974862964544 run_alphafold.py:203] Total JAX model model_2_pred_0 on query predict time (includes compilation time, see --benchmark): 161.9s
I0409 16:58:58.654386 139974862964544 amber_minimize.py:178] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 67 (GLU) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 16:58:58.740167 139974862964544 amber_minimize.py:408] Minimizing protein, attempt 1 of 100.
I0409 16:58:58.918553 139974862964544 amber_minimize.py:69] Restraining 574 / 1170 particles.
I0409 16:59:00.766768 139974862964544 amber_minimize.py:178] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 16:59:00.900718 139974862964544 amber_minimize.py:500] Iteration completed: Einit 1177.51 Efinal -2115.41 Time 1.00 s num residue violations 0 num residue exclusions 0 
I0409 16:59:00.948315 139974862964544 run_alphafold.py:191] Running model model_3_pred_0 on query
I0409 16:59:02.960916 139974862964544 model.py:165] Running predict with shape(feat) = {'aatype': (4, 68), 'residue_index': (4, 68), 'seq_length': (4,), 'is_distillation': (4,), 'seq_mask': (4, 68), 'msa_mask': (4, 512, 68), 'msa_row_mask': (4, 512), 'random_crop_to_size_seed': (4, 2), 'atom14_atom_exists': (4, 68, 14), 'residx_atom14_to_atom37': (4, 68, 14), 'residx_atom37_to_atom14': (4, 68, 37), 'atom37_atom_exists': (4, 68, 37), 'extra_msa': (4, 5120, 68), 'extra_msa_mask': (4, 5120, 68), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 512, 68), 'true_msa': (4, 512, 68), 'extra_has_deletion': (4, 5120, 68), 'extra_deletion_value': (4, 5120, 68), 'msa_feat': (4, 512, 68, 49), 'target_feat': (4, 68, 22)}
I0409 17:01:13.888485 139974862964544 model.py:175] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (68, 68, 64)}, 'experimentally_resolved': {'logits': (68, 37)}, 'masked_msa': {'logits': (512, 68, 23)}, 'predicted_lddt': {'logits': (68, 50)}, 'structure_module': {'final_atom_mask': (68, 37), 'final_atom_positions': (68, 37, 3)}, 'plddt': (68,), 'ranking_confidence': ()}
I0409 17:01:13.888778 139974862964544 run_alphafold.py:203] Total JAX model model_3_pred_0 on query predict time (includes compilation time, see --benchmark): 130.9s
I0409 17:01:18.714238 139974862964544 amber_minimize.py:178] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 67 (GLU) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 17:01:18.799035 139974862964544 amber_minimize.py:408] Minimizing protein, attempt 1 of 100.
I0409 17:01:18.977876 139974862964544 amber_minimize.py:69] Restraining 574 / 1170 particles.
I0409 17:01:21.329899 139974862964544 amber_minimize.py:178] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 17:01:21.469244 139974862964544 amber_minimize.py:500] Iteration completed: Einit 1417.15 Efinal -2149.67 Time 1.07 s num residue violations 0 num residue exclusions 0 
I0409 17:01:21.516450 139974862964544 run_alphafold.py:191] Running model model_4_pred_0 on query
I0409 17:01:23.141258 139974862964544 model.py:165] Running predict with shape(feat) = {'aatype': (4, 68), 'residue_index': (4, 68), 'seq_length': (4,), 'is_distillation': (4,), 'seq_mask': (4, 68), 'msa_mask': (4, 512, 68), 'msa_row_mask': (4, 512), 'random_crop_to_size_seed': (4, 2), 'atom14_atom_exists': (4, 68, 14), 'residx_atom14_to_atom37': (4, 68, 14), 'residx_atom37_to_atom14': (4, 68, 37), 'atom37_atom_exists': (4, 68, 37), 'extra_msa': (4, 5120, 68), 'extra_msa_mask': (4, 5120, 68), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 512, 68), 'true_msa': (4, 512, 68), 'extra_has_deletion': (4, 5120, 68), 'extra_deletion_value': (4, 5120, 68), 'msa_feat': (4, 512, 68, 49), 'target_feat': (4, 68, 22)}
I0409 17:03:32.715506 139974862964544 model.py:175] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (68, 68, 64)}, 'experimentally_resolved': {'logits': (68, 37)}, 'masked_msa': {'logits': (512, 68, 23)}, 'predicted_lddt': {'logits': (68, 50)}, 'structure_module': {'final_atom_mask': (68, 37), 'final_atom_positions': (68, 37, 3)}, 'plddt': (68,), 'ranking_confidence': ()}
I0409 17:03:32.715815 139974862964544 run_alphafold.py:203] Total JAX model model_4_pred_0 on query predict time (includes compilation time, see --benchmark): 129.6s
I0409 17:03:37.781708 139974862964544 amber_minimize.py:178] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 67 (GLU) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 17:03:37.869994 139974862964544 amber_minimize.py:408] Minimizing protein, attempt 1 of 100.
I0409 17:03:38.058344 139974862964544 amber_minimize.py:69] Restraining 574 / 1170 particles.
I0409 17:03:40.003796 139974862964544 amber_minimize.py:178] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 17:03:40.142424 139974862964544 amber_minimize.py:500] Iteration completed: Einit 912.01 Efinal -2098.49 Time 1.07 s num residue violations 0 num residue exclusions 0 
I0409 17:03:40.187411 139974862964544 run_alphafold.py:191] Running model model_5_pred_0 on query
I0409 17:03:42.355729 139974862964544 model.py:165] Running predict with shape(feat) = {'aatype': (4, 68), 'residue_index': (4, 68), 'seq_length': (4,), 'is_distillation': (4,), 'seq_mask': (4, 68), 'msa_mask': (4, 512, 68), 'msa_row_mask': (4, 512), 'random_crop_to_size_seed': (4, 2), 'atom14_atom_exists': (4, 68, 14), 'residx_atom14_to_atom37': (4, 68, 14), 'residx_atom37_to_atom14': (4, 68, 37), 'atom37_atom_exists': (4, 68, 37), 'extra_msa': (4, 1024, 68), 'extra_msa_mask': (4, 1024, 68), 'extra_msa_row_mask': (4, 1024), 'bert_mask': (4, 512, 68), 'true_msa': (4, 512, 68), 'extra_has_deletion': (4, 1024, 68), 'extra_deletion_value': (4, 1024, 68), 'msa_feat': (4, 512, 68, 49), 'target_feat': (4, 68, 22)}
I0409 17:05:54.998405 139974862964544 model.py:175] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (68, 68, 64)}, 'experimentally_resolved': {'logits': (68, 37)}, 'masked_msa': {'logits': (512, 68, 23)}, 'predicted_lddt': {'logits': (68, 50)}, 'structure_module': {'final_atom_mask': (68, 37), 'final_atom_positions': (68, 37, 3)}, 'plddt': (68,), 'ranking_confidence': ()}
I0409 17:05:54.998687 139974862964544 run_alphafold.py:203] Total JAX model model_5_pred_0 on query predict time (includes compilation time, see --benchmark): 132.6s
I0409 17:05:59.617951 139974862964544 amber_minimize.py:178] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 67 (GLU) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 17:05:59.703099 139974862964544 amber_minimize.py:408] Minimizing protein, attempt 1 of 100.
I0409 17:06:00.330353 139974862964544 amber_minimize.py:69] Restraining 574 / 1170 particles.
I0409 17:06:02.288254 139974862964544 amber_minimize.py:178] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 17:06:02.423708 139974862964544 amber_minimize.py:500] Iteration completed: Einit 1366.86 Efinal -2117.71 Time 1.52 s num residue violations 0 num residue exclusions 0 
I0409 17:06:02.470051 139974862964544 run_alphafold.py:277] Final timings for query: {'features': 198.3622589111328, 'process_features_model_1_pred_0': 3.288827419281006, 'predict_and_compile_model_1_pred_0': 162.52829456329346, 'relax_model_1_pred_0': 8.678949356079102, 'process_features_model_2_pred_0': 1.8950560092926025, 'predict_and_compile_model_2_pred_0': 161.90068864822388, 'relax_model_2_pred_0': 5.575250148773193, 'process_features_model_3_pred_0': 2.012259006500244, 'predict_and_compile_model_3_pred_0': 130.9280300140381, 'relax_model_3_pred_0': 6.232355356216431, 'process_features_model_4_pred_0': 1.6244299411773682, 'predict_and_compile_model_4_pred_0': 129.57472157478333, 'relax_model_4_pred_0': 6.115770101547241, 'process_features_model_5_pred_0': 2.1679391860961914, 'predict_and_compile_model_5_pred_0': 132.64313626289368, 'relax_model_5_pred_0': 6.109053611755371}

代码

文本

5. 查看排名信息

分析模型排名和置信度信息。

代码

文本

[5]

import json

fasta_name = Path(FASTA_PATH).stem

# 读取排名信息

ranking_file = os.path.join(OUTPUT_DIR, fasta_name, 'ranking_debug.json')

if os.path.exists(ranking_file):

with open(ranking_file, 'r') as f:

ranking_data = json.load(f)

print("模型排名顺序:")

for i, model_name in enumerate(ranking_data['order']):

confidence_key = list(ranking_data.keys())[0] # 'plddts' 或 'iptm+ptm'

confidence = ranking_data[confidence_key][model_name]

print(f" {i+1}. {model_name} {confidence_key}: {confidence:.4f} ")

if i == 0:

best_model_name = model_name

else:

print(f"找不到排名文件: {ranking_file}")

模型排名顺序:
  1. model_3_pred_0 plddts: 88.0761 
  2. model_1_pred_0 plddts: 86.5758 
  3. model_2_pred_0 plddts: 86.1264 
  4. model_4_pred_0 plddts: 85.5704 
  5. model_5_pred_0 plddts: 84.4112

代码

文本

[6]

best_model_name

'model_3_pred_0'

代码

文本

6. 可视化结果

预测完成后，我们可以可视化预测的蛋白质结构。

代码

文本

[7]

def visualize_pdb(pdb_file, width=800, height=600):

"""使用py3Dmol可视化PDB文件"""

with open(pdb_file, 'r') as f:

pdb_content = f.read()

viewer = py3Dmol.view(width=width, height=height)

viewer.addModel(pdb_content, 'pdb')

viewer.setStyle({'cartoon': {'colorscheme': 'rainbow'}})

viewer.spin(True)

viewer.zoomTo()

return viewer

代码

文本

[8]

# 找到排名最高的模型并可视化

best_rank = best_model_name.split('_')[-3]

fasta_name = Path(FASTA_PATH).stem

predicted_pdb = os.path.join(OUTPUT_DIR, fasta_name, 'ranked_{}.pdb'.format(best_rank))

if os.path.exists(predicted_pdb):

view = visualize_pdb(predicted_pdb)

view.show()

else:

print(f"找不到预测结果: {predicted_pdb}")

print("请先运行AlphaFold2预测。")

代码

文本

7. 分析预测质量

查看并分析pLDDT分数，评估预测的质量。

代码

文本

[9]

import json

import pickle

def plot_plddt(output_dir, fasta_name):

"""绘制pLDDT分数"""

result_files = [f for f in os.listdir(os.path.join(output_dir, fasta_name))

if f.startswith('result_') and f.endswith('.pkl')]

if not result_files:

print("找不到结果文件")

return

plt.figure(figsize=(12, 6))

for result_file in result_files:

model_name = result_file.replace('result_', '').replace('.pkl', '')

with open(os.path.join(output_dir, fasta_name, result_file), 'rb') as f:

result = pickle.load(f)

plddt = result['plddt']

plt.plot(plddt, label=model_name)

plt.axhline(y=70, color='r', linestyle='--', label='pLDDT=70')

plt.xlabel('residue position')

plt.ylabel('pLDDT score')

plt.title('prediction quality (pLDDT)')

plt.legend()

plt.grid(True)

plt.show()

代码

文本

[10]

# 绘制pLDDT分数

fasta_name = Path(FASTA_PATH).stem

plot_plddt(OUTPUT_DIR, fasta_name)

代码

文本

8. 查看时间信息

查看预测过程的时间消耗。

代码

文本

[11]

# 读取时间信息

timings_file = os.path.join(OUTPUT_DIR, fasta_name, 'timings.json')

if os.path.exists(timings_file):

with open(timings_file, 'r') as f:

timings_data = json.load(f)

print("各步骤耗时(秒):")

for step, time in timings_data.items():

print(f" {step}: {time:.2f}")

else:

print(f"找不到时间信息文件: {timings_file}")

各步骤耗时(秒):
  features: 198.36
  process_features_model_1_pred_0: 3.29
  predict_and_compile_model_1_pred_0: 162.53
  relax_model_1_pred_0: 8.68
  process_features_model_2_pred_0: 1.90
  predict_and_compile_model_2_pred_0: 161.90
  relax_model_2_pred_0: 5.58
  process_features_model_3_pred_0: 2.01
  predict_and_compile_model_3_pred_0: 130.93
  relax_model_3_pred_0: 6.23
  process_features_model_4_pred_0: 1.62
  predict_and_compile_model_4_pred_0: 129.57
  relax_model_4_pred_0: 6.12
  process_features_model_5_pred_0: 2.17
  predict_and_compile_model_5_pred_0: 132.64
  relax_model_5_pred_0: 6.11

代码

文本

参考资料

Jumper, John, et al. "Highly accurate protein structure prediction with AlphaFold." nature 596.7873 (2021): 583-589.
项目地址：https://github.com/google-deepmind/alphafold

代码

文本

双击即可修改

代码

文本

双击即可修改

代码

文本

python

structure prediction

Deep Learning

pythonstructure predictionDeep Learning

点个赞吧