中
AlphaFold2使用基础流程


更新于 2025-04-09
推荐镜像 :structure-pred-af:version_1
推荐机型 :c12_m92_1 * NVIDIA V100
赞
目录
此notebook演示了使用了alphafold的基本流程并对结果进行了可视化,为节省时间,使用了预先计算的MSA。
代码
文本
1. 环境准备
代码
文本
[1]
import os
import sys
import subprocess
from pathlib import Path
import matplotlib.pyplot as plt
import numpy as np
import py3Dmol
from IPython.display import display, HTML
代码
文本
2. 设置参数
设置AlphaFold2运行所需的各种参数。
代码
文本
[2]
# 设置路径
ALPHAFOLD_PATH = "/opt/alphafold_project/alphafold-2.3.1" # AlphaFold安装路径
DATABASE_DIR = "/share/structure_prediction/af2_database/" # 数据库路径
OUTPUT_DIR = "/opt/alphafold_project/alphafold-2.3.1/precomputed_test/" # 输出路径
FASTA_PATH = "/opt/alphafold_project/alphafold-2.3.1/example/query.fasta" # FASTA文件路径
MAX_TEMPLATE_DATE = "2020-05-14" # 模板截止日期
# 实际预测过程中,MSA会耗费大量时间,因此此步骤中使用预计算的MSAs
USE_PRECOMPUTED_MSAS = "true" # 是否使用预计算的MSAs
代码
文本
3. 准备输入FASTA文件
查看输入的FASTA文件内容。
代码
文本
[3]
# 显示FASTA文件内容
with open(os.path.join(ALPHAFOLD_PATH, FASTA_PATH), 'r') as f:
fasta_content = f.read()
print(fasta_content)
>dummy_sequence GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE
代码
文本
4. 运行AlphaFold2
代码
文本
[4]
# 确保输出目录存在
# 运行时间大概需要10~15min
os.makedirs(OUTPUT_DIR, exist_ok=True)
# 构建AlphaFold2运行命令
cmd = f"cd {ALPHAFOLD_PATH} && bash run_alphafold.sh \
-d {DATABASE_DIR} \
-o {os.path.abspath(OUTPUT_DIR)} \
-f {os.path.join(ALPHAFOLD_PATH, FASTA_PATH)} \
-t {MAX_TEMPLATE_DATE} \
-p {USE_PRECOMPUTED_MSAS}"
print(f"运行命令: {cmd}")
# 运行命令
!{cmd}
运行命令: cd /opt/alphafold_project/alphafold-2.3.1 && bash run_alphafold.sh -d /share/structure_prediction/af2_database/ -o /opt/alphafold_project/alphafold-2.3.1/precomputed_test -f /opt/alphafold_project/alphafold-2.3.1/example/query.fasta -t 2020-05-14 -p true I0409 16:49:26.580164 139974862964544 templates.py:857] Using precomputed obsolete pdbs /share/structure_prediction/af2_database//pdb_mmcif/obsolete.dat. I0409 16:49:30.901414 139974862964544 xla_bridge.py:353] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker: I0409 16:49:31.044875 139974862964544 xla_bridge.py:353] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Host Interpreter CUDA I0409 16:49:31.045314 139974862964544 xla_bridge.py:353] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client' I0409 16:49:31.045428 139974862964544 xla_bridge.py:353] Unable to initialize backend 'plugin': xla_extension has no attributes named get_plugin_device_client. Compile TensorFlow with //tensorflow/compiler/xla/python:enable_plugin_device set to true (defaults to false) to enable this. I0409 16:49:55.518093 139974862964544 run_alphafold.py:386] Have 5 models: ['model_1_pred_0', 'model_2_pred_0', 'model_3_pred_0', 'model_4_pred_0', 'model_5_pred_0'] I0409 16:49:55.518320 139974862964544 run_alphafold.py:403] Using random seed 442039092369751438 for the data pipeline I0409 16:49:55.518614 139974862964544 run_alphafold.py:161] Predicting query W0409 16:49:55.518854 139974862964544 pipeline.py:100] Reading MSA from file /opt/alphafold_project/alphafold-2.3.1/precomputed_test/query/msas/uniref90_hits.sto W0409 16:49:55.519006 139974862964544 pipeline.py:100] Reading MSA from file /opt/alphafold_project/alphafold-2.3.1/precomputed_test/query/msas/mgnify_hits.sto I0409 16:49:55.519525 139974862964544 hhsearch.py:85] Launching subprocess "/opt/mamba/bin/hhsearch -i /tmp/tmpk141pi3w/query.a3m -o /tmp/tmpk141pi3w/output.hhr -maxseq 1000000 -d /share/structure_prediction/af2_database//pdb70/pdb70" I0409 16:49:55.608169 139974862964544 utils.py:36] Started HHsearch query I0409 16:53:04.004565 139974862964544 utils.py:40] Finished HHsearch query in 188.396 seconds W0409 16:53:04.031605 139974862964544 pipeline.py:100] Reading MSA from file /opt/alphafold_project/alphafold-2.3.1/precomputed_test/query/msas/bfd_uniref_hits.a3m I0409 16:53:04.031926 139974862964544 templates.py:878] Searching for template for: GWSTELEKHREELKEFLKKEGITNVEIRIDNGRLEVRVEGGTERLKRFLEELRQKLEKKGYTVDIKIE I0409 16:53:04.032404 139974862964544 templates.py:718] hit 6mrr_A did not pass prefilter: Template is an exact subsequence of query with large coverage. Length ratio: 1.0. I0409 16:53:04.032500 139974862964544 templates.py:912] Skipped invalid hit 6MRR_A foldit1; De novo protein, Foldit; 1.18A {synthetic construct}, error: None, warning: None I0409 16:53:04.526734 139974862964544 templates.py:267] Found an exact template match 6q64_A. I0409 16:53:05.057441 139974862964544 templates.py:267] Found an exact template match 4s3k_A. I0409 16:53:05.531115 139974862964544 templates.py:267] Found an exact template match 5jh8_A. I0409 16:53:05.828265 139974862964544 templates.py:267] Found an exact template match 1jnd_A. I0409 16:53:06.471878 139974862964544 templates.py:267] Found an exact template match 5y2a_B. I0409 16:53:08.018202 139974862964544 templates.py:267] Found an exact template match 4wiw_B. I0409 16:53:08.031688 139974862964544 templates.py:267] Found an exact template match 4wiw_D. I0409 16:53:08.543946 139974862964544 templates.py:267] Found an exact template match 6jm7_A. I0409 16:53:08.772002 139974862964544 templates.py:267] Found an exact template match 6jmb_A. I0409 16:53:09.166378 139974862964544 templates.py:267] Found an exact template match 4q6t_A. I0409 16:53:09.689095 139974862964544 templates.py:267] Found an exact template match 3oa5_B. I0409 16:53:10.124649 139974862964544 templates.py:267] Found an exact template match 5y2c_A. I0409 16:53:10.233451 139974862964544 templates.py:267] Found an exact template match 5cuk_A. I0409 16:53:11.684303 139974862964544 templates.py:267] Found an exact template match 4a5q_E. I0409 16:53:11.934331 139974862964544 templates.py:267] Found an exact template match 5y2b_A. I0409 16:53:12.207329 139974862964544 templates.py:267] Found an exact template match 4lgx_A. I0409 16:53:12.808989 139974862964544 templates.py:267] Found an exact template match 4w5u_A. I0409 16:53:13.496593 139974862964544 templates.py:267] Found an exact template match 6jav_A. I0409 16:53:13.854119 139974862964544 templates.py:267] Found an exact template match 3cz8_A. I0409 16:53:13.866688 139974862964544 templates.py:267] Found an exact template match 3cz8_B. I0409 16:53:13.879991 139974862964544 pipeline.py:234] Uniref90 MSA size: 2 sequences. I0409 16:53:13.880134 139974862964544 pipeline.py:235] BFD MSA size: 1 sequences. I0409 16:53:13.880180 139974862964544 pipeline.py:236] MGnify MSA size: 2 sequences. I0409 16:53:13.880219 139974862964544 pipeline.py:237] Final (deduplicated) MSA size: 2 sequences. I0409 16:53:13.880470 139974862964544 pipeline.py:239] Total number of templates (NB: this can include bad templates and is later filtered to top 4): 20. I0409 16:53:13.882184 139974862964544 run_alphafold.py:191] Running model model_1_pred_0 on query I0409 16:53:17.171739 139974862964544 model.py:165] Running predict with shape(feat) = {'aatype': (4, 68), 'residue_index': (4, 68), 'seq_length': (4,), 'template_aatype': (4, 4, 68), 'template_all_atom_masks': (4, 4, 68, 37), 'template_all_atom_positions': (4, 4, 68, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 68), 'msa_mask': (4, 508, 68), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 68, 3), 'template_pseudo_beta_mask': (4, 4, 68), 'atom14_atom_exists': (4, 68, 14), 'residx_atom14_to_atom37': (4, 68, 14), 'residx_atom37_to_atom14': (4, 68, 37), 'atom37_atom_exists': (4, 68, 37), 'extra_msa': (4, 5120, 68), 'extra_msa_mask': (4, 5120, 68), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 68), 'true_msa': (4, 508, 68), 'extra_has_deletion': (4, 5120, 68), 'extra_deletion_value': (4, 5120, 68), 'msa_feat': (4, 508, 68, 49), 'target_feat': (4, 68, 22)} I0409 16:55:59.699222 139974862964544 model.py:175] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (68, 68, 64)}, 'experimentally_resolved': {'logits': (68, 37)}, 'masked_msa': {'logits': (508, 68, 23)}, 'predicted_lddt': {'logits': (68, 50)}, 'structure_module': {'final_atom_mask': (68, 37), 'final_atom_positions': (68, 37, 3)}, 'plddt': (68,), 'ranking_confidence': ()} I0409 16:55:59.699431 139974862964544 run_alphafold.py:203] Total JAX model model_1_pred_0 on query predict time (includes compilation time, see --benchmark): 162.5s I0409 16:56:05.644319 139974862964544 amber_minimize.py:178] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 67 (GLU) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}} I0409 16:56:05.726140 139974862964544 amber_minimize.py:408] Minimizing protein, attempt 1 of 100. I0409 16:56:05.904006 139974862964544 amber_minimize.py:69] Restraining 574 / 1170 particles. I0409 16:56:07.666443 139974862964544 amber_minimize.py:178] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {}, 'Se_in_MET': [], 'removed_chains': {0: []}} I0409 16:56:10.182704 139974862964544 amber_minimize.py:500] Iteration completed: Einit 801.10 Efinal -2145.04 Time 0.98 s num residue violations 0 num residue exclusions 0 I0409 16:56:10.258744 139974862964544 run_alphafold.py:191] Running model model_2_pred_0 on query I0409 16:56:12.154409 139974862964544 model.py:165] Running predict with shape(feat) = {'aatype': (4, 68), 'residue_index': (4, 68), 'seq_length': (4,), 'template_aatype': (4, 4, 68), 'template_all_atom_masks': (4, 4, 68, 37), 'template_all_atom_positions': (4, 4, 68, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 68), 'msa_mask': (4, 508, 68), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 68, 3), 'template_pseudo_beta_mask': (4, 4, 68), 'atom14_atom_exists': (4, 68, 14), 'residx_atom14_to_atom37': (4, 68, 14), 'residx_atom37_to_atom14': (4, 68, 37), 'atom37_atom_exists': (4, 68, 37), 'extra_msa': (4, 1024, 68), 'extra_msa_mask': (4, 1024, 68), 'extra_msa_row_mask': (4, 1024), 'bert_mask': (4, 508, 68), 'true_msa': (4, 508, 68), 'extra_has_deletion': (4, 1024, 68), 'extra_deletion_value': (4, 1024, 68), 'msa_feat': (4, 508, 68, 49), 'target_feat': (4, 68, 22)} I0409 16:58:54.054380 139974862964544 model.py:175] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (68, 68, 64)}, 'experimentally_resolved': {'logits': (68, 37)}, 'masked_msa': {'logits': (508, 68, 23)}, 'predicted_lddt': {'logits': (68, 50)}, 'structure_module': {'final_atom_mask': (68, 37), 'final_atom_positions': (68, 37, 3)}, 'plddt': (68,), 'ranking_confidence': ()} I0409 16:58:54.054674 139974862964544 run_alphafold.py:203] Total JAX model model_2_pred_0 on query predict time (includes compilation time, see --benchmark): 161.9s I0409 16:58:58.654386 139974862964544 amber_minimize.py:178] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 67 (GLU) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}} I0409 16:58:58.740167 139974862964544 amber_minimize.py:408] Minimizing protein, attempt 1 of 100. I0409 16:58:58.918553 139974862964544 amber_minimize.py:69] Restraining 574 / 1170 particles. I0409 16:59:00.766768 139974862964544 amber_minimize.py:178] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {}, 'Se_in_MET': [], 'removed_chains': {0: []}} I0409 16:59:00.900718 139974862964544 amber_minimize.py:500] Iteration completed: Einit 1177.51 Efinal -2115.41 Time 1.00 s num residue violations 0 num residue exclusions 0 I0409 16:59:00.948315 139974862964544 run_alphafold.py:191] Running model model_3_pred_0 on query I0409 16:59:02.960916 139974862964544 model.py:165] Running predict with shape(feat) = {'aatype': (4, 68), 'residue_index': (4, 68), 'seq_length': (4,), 'is_distillation': (4,), 'seq_mask': (4, 68), 'msa_mask': (4, 512, 68), 'msa_row_mask': (4, 512), 'random_crop_to_size_seed': (4, 2), 'atom14_atom_exists': (4, 68, 14), 'residx_atom14_to_atom37': (4, 68, 14), 'residx_atom37_to_atom14': (4, 68, 37), 'atom37_atom_exists': (4, 68, 37), 'extra_msa': (4, 5120, 68), 'extra_msa_mask': (4, 5120, 68), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 512, 68), 'true_msa': (4, 512, 68), 'extra_has_deletion': (4, 5120, 68), 'extra_deletion_value': (4, 5120, 68), 'msa_feat': (4, 512, 68, 49), 'target_feat': (4, 68, 22)} I0409 17:01:13.888485 139974862964544 model.py:175] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (68, 68, 64)}, 'experimentally_resolved': {'logits': (68, 37)}, 'masked_msa': {'logits': (512, 68, 23)}, 'predicted_lddt': {'logits': (68, 50)}, 'structure_module': {'final_atom_mask': (68, 37), 'final_atom_positions': (68, 37, 3)}, 'plddt': (68,), 'ranking_confidence': ()} I0409 17:01:13.888778 139974862964544 run_alphafold.py:203] Total JAX model model_3_pred_0 on query predict time (includes compilation time, see --benchmark): 130.9s I0409 17:01:18.714238 139974862964544 amber_minimize.py:178] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 67 (GLU) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}} I0409 17:01:18.799035 139974862964544 amber_minimize.py:408] Minimizing protein, attempt 1 of 100. I0409 17:01:18.977876 139974862964544 amber_minimize.py:69] Restraining 574 / 1170 particles. I0409 17:01:21.329899 139974862964544 amber_minimize.py:178] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {}, 'Se_in_MET': [], 'removed_chains': {0: []}} I0409 17:01:21.469244 139974862964544 amber_minimize.py:500] Iteration completed: Einit 1417.15 Efinal -2149.67 Time 1.07 s num residue violations 0 num residue exclusions 0 I0409 17:01:21.516450 139974862964544 run_alphafold.py:191] Running model model_4_pred_0 on query I0409 17:01:23.141258 139974862964544 model.py:165] Running predict with shape(feat) = {'aatype': (4, 68), 'residue_index': (4, 68), 'seq_length': (4,), 'is_distillation': (4,), 'seq_mask': (4, 68), 'msa_mask': (4, 512, 68), 'msa_row_mask': (4, 512), 'random_crop_to_size_seed': (4, 2), 'atom14_atom_exists': (4, 68, 14), 'residx_atom14_to_atom37': (4, 68, 14), 'residx_atom37_to_atom14': (4, 68, 37), 'atom37_atom_exists': (4, 68, 37), 'extra_msa': (4, 5120, 68), 'extra_msa_mask': (4, 5120, 68), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 512, 68), 'true_msa': (4, 512, 68), 'extra_has_deletion': (4, 5120, 68), 'extra_deletion_value': (4, 5120, 68), 'msa_feat': (4, 512, 68, 49), 'target_feat': (4, 68, 22)} I0409 17:03:32.715506 139974862964544 model.py:175] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (68, 68, 64)}, 'experimentally_resolved': {'logits': (68, 37)}, 'masked_msa': {'logits': (512, 68, 23)}, 'predicted_lddt': {'logits': (68, 50)}, 'structure_module': {'final_atom_mask': (68, 37), 'final_atom_positions': (68, 37, 3)}, 'plddt': (68,), 'ranking_confidence': ()} I0409 17:03:32.715815 139974862964544 run_alphafold.py:203] Total JAX model model_4_pred_0 on query predict time (includes compilation time, see --benchmark): 129.6s I0409 17:03:37.781708 139974862964544 amber_minimize.py:178] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 67 (GLU) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}} I0409 17:03:37.869994 139974862964544 amber_minimize.py:408] Minimizing protein, attempt 1 of 100. I0409 17:03:38.058344 139974862964544 amber_minimize.py:69] Restraining 574 / 1170 particles. I0409 17:03:40.003796 139974862964544 amber_minimize.py:178] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {}, 'Se_in_MET': [], 'removed_chains': {0: []}} I0409 17:03:40.142424 139974862964544 amber_minimize.py:500] Iteration completed: Einit 912.01 Efinal -2098.49 Time 1.07 s num residue violations 0 num residue exclusions 0 I0409 17:03:40.187411 139974862964544 run_alphafold.py:191] Running model model_5_pred_0 on query I0409 17:03:42.355729 139974862964544 model.py:165] Running predict with shape(feat) = {'aatype': (4, 68), 'residue_index': (4, 68), 'seq_length': (4,), 'is_distillation': (4,), 'seq_mask': (4, 68), 'msa_mask': (4, 512, 68), 'msa_row_mask': (4, 512), 'random_crop_to_size_seed': (4, 2), 'atom14_atom_exists': (4, 68, 14), 'residx_atom14_to_atom37': (4, 68, 14), 'residx_atom37_to_atom14': (4, 68, 37), 'atom37_atom_exists': (4, 68, 37), 'extra_msa': (4, 1024, 68), 'extra_msa_mask': (4, 1024, 68), 'extra_msa_row_mask': (4, 1024), 'bert_mask': (4, 512, 68), 'true_msa': (4, 512, 68), 'extra_has_deletion': (4, 1024, 68), 'extra_deletion_value': (4, 1024, 68), 'msa_feat': (4, 512, 68, 49), 'target_feat': (4, 68, 22)} I0409 17:05:54.998405 139974862964544 model.py:175] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (68, 68, 64)}, 'experimentally_resolved': {'logits': (68, 37)}, 'masked_msa': {'logits': (512, 68, 23)}, 'predicted_lddt': {'logits': (68, 50)}, 'structure_module': {'final_atom_mask': (68, 37), 'final_atom_positions': (68, 37, 3)}, 'plddt': (68,), 'ranking_confidence': ()} I0409 17:05:54.998687 139974862964544 run_alphafold.py:203] Total JAX model model_5_pred_0 on query predict time (includes compilation time, see --benchmark): 132.6s I0409 17:05:59.617951 139974862964544 amber_minimize.py:178] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 67 (GLU) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}} I0409 17:05:59.703099 139974862964544 amber_minimize.py:408] Minimizing protein, attempt 1 of 100. I0409 17:06:00.330353 139974862964544 amber_minimize.py:69] Restraining 574 / 1170 particles. I0409 17:06:02.288254 139974862964544 amber_minimize.py:178] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {}, 'Se_in_MET': [], 'removed_chains': {0: []}} I0409 17:06:02.423708 139974862964544 amber_minimize.py:500] Iteration completed: Einit 1366.86 Efinal -2117.71 Time 1.52 s num residue violations 0 num residue exclusions 0 I0409 17:06:02.470051 139974862964544 run_alphafold.py:277] Final timings for query: {'features': 198.3622589111328, 'process_features_model_1_pred_0': 3.288827419281006, 'predict_and_compile_model_1_pred_0': 162.52829456329346, 'relax_model_1_pred_0': 8.678949356079102, 'process_features_model_2_pred_0': 1.8950560092926025, 'predict_and_compile_model_2_pred_0': 161.90068864822388, 'relax_model_2_pred_0': 5.575250148773193, 'process_features_model_3_pred_0': 2.012259006500244, 'predict_and_compile_model_3_pred_0': 130.9280300140381, 'relax_model_3_pred_0': 6.232355356216431, 'process_features_model_4_pred_0': 1.6244299411773682, 'predict_and_compile_model_4_pred_0': 129.57472157478333, 'relax_model_4_pred_0': 6.115770101547241, 'process_features_model_5_pred_0': 2.1679391860961914, 'predict_and_compile_model_5_pred_0': 132.64313626289368, 'relax_model_5_pred_0': 6.109053611755371}
代码
文本
5. 查看排名信息
分析模型排名和置信度信息。
代码
文本
[5]
import json
fasta_name = Path(FASTA_PATH).stem
# 读取排名信息
ranking_file = os.path.join(OUTPUT_DIR, fasta_name, 'ranking_debug.json')
if os.path.exists(ranking_file):
with open(ranking_file, 'r') as f:
ranking_data = json.load(f)
print("模型排名顺序:")
for i, model_name in enumerate(ranking_data['order']):
confidence_key = list(ranking_data.keys())[0] # 'plddts' 或 'iptm+ptm'
confidence = ranking_data[confidence_key][model_name]
print(f" {i+1}. {model_name} {confidence_key}: {confidence:.4f} ")
if i == 0:
best_model_name = model_name
else:
print(f"找不到排名文件: {ranking_file}")
模型排名顺序: 1. model_3_pred_0 plddts: 88.0761 2. model_1_pred_0 plddts: 86.5758 3. model_2_pred_0 plddts: 86.1264 4. model_4_pred_0 plddts: 85.5704 5. model_5_pred_0 plddts: 84.4112
代码
文本
[6]
best_model_name
'model_3_pred_0'
代码
文本
6. 可视化结果
预测完成后,我们可以可视化预测的蛋白质结构。
代码
文本
[7]
def visualize_pdb(pdb_file, width=800, height=600):
"""使用py3Dmol可视化PDB文件"""
with open(pdb_file, 'r') as f:
pdb_content = f.read()
viewer = py3Dmol.view(width=width, height=height)
viewer.addModel(pdb_content, 'pdb')
viewer.setStyle({'cartoon': {'colorscheme': 'rainbow'}})
viewer.spin(True)
viewer.zoomTo()
return viewer
代码
文本
[8]
# 找到排名最高的模型并可视化
best_rank = best_model_name.split('_')[-3]
fasta_name = Path(FASTA_PATH).stem
predicted_pdb = os.path.join(OUTPUT_DIR, fasta_name, 'ranked_{}.pdb'.format(best_rank))
if os.path.exists(predicted_pdb):
view = visualize_pdb(predicted_pdb)
view.show()
else:
print(f"找不到预测结果: {predicted_pdb}")
print("请先运行AlphaFold2预测。")
代码
文本
7. 分析预测质量
查看并分析pLDDT分数,评估预测的质量。
代码
文本
[9]
import json
import pickle
def plot_plddt(output_dir, fasta_name):
"""绘制pLDDT分数"""
result_files = [f for f in os.listdir(os.path.join(output_dir, fasta_name))
if f.startswith('result_') and f.endswith('.pkl')]
if not result_files:
print("找不到结果文件")
return
plt.figure(figsize=(12, 6))
for result_file in result_files:
model_name = result_file.replace('result_', '').replace('.pkl', '')
with open(os.path.join(output_dir, fasta_name, result_file), 'rb') as f:
result = pickle.load(f)
plddt = result['plddt']
plt.plot(plddt, label=model_name)
plt.axhline(y=70, color='r', linestyle='--', label='pLDDT=70')
plt.xlabel('residue position')
plt.ylabel('pLDDT score')
plt.title('prediction quality (pLDDT)')
plt.legend()
plt.grid(True)
plt.show()
代码
文本
[10]
# 绘制pLDDT分数
fasta_name = Path(FASTA_PATH).stem
plot_plddt(OUTPUT_DIR, fasta_name)
代码
文本
8. 查看时间信息
查看预测过程的时间消耗。
代码
文本
[11]
# 读取时间信息
timings_file = os.path.join(OUTPUT_DIR, fasta_name, 'timings.json')
if os.path.exists(timings_file):
with open(timings_file, 'r') as f:
timings_data = json.load(f)
print("各步骤耗时(秒):")
for step, time in timings_data.items():
print(f" {step}: {time:.2f}")
else:
print(f"找不到时间信息文件: {timings_file}")
各步骤耗时(秒): features: 198.36 process_features_model_1_pred_0: 3.29 predict_and_compile_model_1_pred_0: 162.53 relax_model_1_pred_0: 8.68 process_features_model_2_pred_0: 1.90 predict_and_compile_model_2_pred_0: 161.90 relax_model_2_pred_0: 5.58 process_features_model_3_pred_0: 2.01 predict_and_compile_model_3_pred_0: 130.93 relax_model_3_pred_0: 6.23 process_features_model_4_pred_0: 1.62 predict_and_compile_model_4_pred_0: 129.57 relax_model_4_pred_0: 6.12 process_features_model_5_pred_0: 2.17 predict_and_compile_model_5_pred_0: 132.64 relax_model_5_pred_0: 6.11
代码
文本
参考资料
- Jumper, John, et al. "Highly accurate protein structure prediction with AlphaFold." nature 596.7873 (2021): 583-589.
- 项目地址:https://github.com/google-deepmind/alphafold
代码
文本
双击即可修改
代码
文本
双击即可修改
代码
文本
点个赞吧