Bohrium
robot
新建

空间站广场

论文
Notebooks
比赛
课程
Apps
我的主页
我的Notebooks
我的论文库
我的足迹

我的工作空间

任务
节点
文件
数据集
镜像
项目
数据库
公开
Unifold Batch Inference
Uni-Fold
Uni-Fold
csg
发布于 2023-09-24
推荐镜像 :unfold-v221-batch-notebook:v221
推荐机型 :c12_m92_1 * NVIDIA V100
赞 2
3
5
Uni-Fold Notebook
CONFIGURATION
CONFIGURATION

©️ Copyright 2023 @ Authors
作者: 陈述高 📨 杨舒文 📨 李子尧 📨
日期:2023-09-13
共享协议:本作品采用知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议进行许可。
快速开始:点击上方的 开始连接 按钮,选择 unfold-v221-batch-notebook镜像 和任意配置机型即可开始。

代码
文本

Uni-Fold Notebook

This notebook provides protein structure prediction service of Uni-Fold as well as UF-Symmetry. Predictions of both protein monomers and multimers are supported. The homology search process in this notebook is enabled with the MMSeqs2 server provided by ColabFold. For more consistent results with the original AlphaFold(-Multimer), please refer to the open-source repository of Uni-Fold, or our convenient web server at Hermite™.

Please note that this notebook is provided as an early-access prototype, and is NOT an official product of DP Technology. It is provided for theoretical modeling only and caution should be exercised in its use.

Licenses

This Colab uses the Uni-Fold model parameters and its outputs are under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license. You can find details at: https://creativecommons.org/licenses/by/4.0/legalcode. The Colab itself is provided under the Apache 2.0 license.

Citations

Please cite the following papers if you use this notebook:

Acknowledgements

The model architecture of Uni-Fold is largely based on AlphaFold and AlphaFold-Multimer. The design of this notebook refers directly to ColabFold. We specially thank @sokrypton for his helpful suggestions to this notebook.

Copyright © 2022 DP Technology. All rights reserved.

代码
文本
[1]
# Unifold Running configration. It's recommend not to change these unless you are familiar to these configrations.

import warnings
warnings.filterwarnings("ignore")
import os
import json
from unifold.colab.data import validate_input, get_features

#@title Provide the arguments here and hit `Run` -> `Run All Cells`
jobname = 'unifold_batch_colab' #@param {type:"string"}
use_templates = True #@param {type:"boolean"}
msa_mode = "MMseqs2" #@param ["MMseqs2","single_sequence"]
#@markdown Parameters for model inference.
max_recycling_iters = 3 #@param {type:"integer"}
num_ensembles = 2 #@param {type:"integer"}
manual_seed = 42 #@param {type:"integer"}
times = 1 #@param {type:"integer"}
#@markdown Plotting parameters.
show_sidechains = False #@param {type:"boolean"}
dpi = 100 #@param {type:"integer"}
max_display_cnt = 3 #@param {type:"integer"}

MIN_SINGLE_SEQUENCE_LENGTH = 6
MAX_SINGLE_SEQUENCE_LENGTH = 3000
MAX_MULTIMER_LENGTH = 3000
代码
文本

CONFIGURATION

Set up input contents (from file or directly filling input_json) and output path.

  • jobname (str): name of the job, served as prefix of output directories.
  • input_json_path (str): path of input json file, which contains a list or dict of proteins. If it's a list, we take indices as IDs. Each protein is a dict with keys:
    • symmetry: protein's symmetry group. Use "C1" as default.
    • sequence: the sequences of the asymmetric unit (splitted by ";").
    • id is optional. if not existed, it will be the order of the sequences.
    • other thing you can add.
  • output_dir_base (str): root directory of output files.

examples of list:

input_json = [
        {'sequence': 'MGSSHHHHHHSSGLVPRGSHMEDRDPTQFEERHLKFLQQLGKGNFGSVEMCRYDPLQDNTGEVVAVKKLQHSTEEHLRDFEREIEILKSLQHDNIVKYKGVCYSAGRRNLKLIMEYLPYGSLRDYLQKHKERIDHIKLLQYTSQICKGMEYLGTKRYIHRDLATRNILVENENRVKIGDFGLTKVLPQDKEFFKVKEPGESPIFWYAPESLTESKFSVASDVWSFGVVLYELFTYIEKSKSPPAEFMRMIGNDKQGQMIVFHLIELLKNNGRLPRPDGCPDEIYMIMTECWNNNVNQRPSFRDLALRVDQIRDNMAG'},
        {'symmetry': 'C2', 'sequence': 'GSHMKNVLIGVQTNLGVNKTGTEFGPDDLIQAYPDTFDEMELISVERQKEDFNDKKLKFKNTVLDTCEKIAKRVNEAVIDGYRPILVGGDHSISLGSVSGVSLEKEIGVLWISAHGDMNTPESTLTGNIHGMPLALLQGLGDRELVNCFYEGAKLDSRNIVIFGAREIEVEERKIIEKTGVKIVYYDDILRKGIDNVLDEVKDYLKIDNLHISIDMNVFDPEIAPGVSVPVRRGMSYDEMFKSLKFAFKNYSVTSADITEFNPLNDINGKTAELVNGIVQYMMNPDY'},
        {'symmetry': 'C2', 'sequence': 'GGSGGSGGSGGSLFCEQVTTVTNLFEKWNDCERTVVMYALLKRLRYPSLKFLQYSIDSNLTQNLGTSQTNLSSVVIDINANNPVYLQNLLNAYKTARKEDILHEVLNMLPLLKPGNEEAKLIYLTLIPVAVKDTMQQIVPTELVQQIFSYLLIHPAITSEDRRSLNIWLRHLEDHIQ;SVPSYGEDELQQAMRLLNAASRQRTEAANEDFGGT'},
        {'symmetry': 'C3', 'sequence': 'LILNLRGGAFVSNTQITMADKQKKFINEIQEGDLVRSYSITDETFQQNAVTSIVKHEADQLCQINFGKQHVVCTVNHRFYDPESKLWKSVCPHPGSGISFLKKYDYLLSEEGEKLQITEIKTFTTKQPVFIYHIQVENNHNFFANGVLAHAMQVSI'},
    ]

Another dict case is showed as followed:

代码
文本

CONFIGURATION

Set up input contents (from file or directly filling input_json) and output path.

  • jobname (str): name of the job, served as prefix of output directories.
  • input_json_path (str): path of input json file, which contains a list or dict of proteins. If it's a list, we take indices as IDs. Each protein is a dict with keys:
    • symmetry: protein's symmetry group. Use "C1" as default.
    • sequence: the sequences of the asymmetric unit (splitted by ";").
    • id is optional. if not existed, it will be the order of the sequences.
    • other thing you can add.
  • output_dir_base (str): root directory of output files.

examples of list:

input_json = [
        {'sequence': 'MGSSHHHHHHSSGLVPRGSHMEDRDPTQFEERHLKFLQQLGKGNFGSVEMCRYDPLQDNTGEVVAVKKLQHSTEEHLRDFEREIEILKSLQHDNIVKYKGVCYSAGRRNLKLIMEYLPYGSLRDYLQKHKERIDHIKLLQYTSQICKGMEYLGTKRYIHRDLATRNILVENENRVKIGDFGLTKVLPQDKEFFKVKEPGESPIFWYAPESLTESKFSVASDVWSFGVVLYELFTYIEKSKSPPAEFMRMIGNDKQGQMIVFHLIELLKNNGRLPRPDGCPDEIYMIMTECWNNNVNQRPSFRDLALRVDQIRDNMAG'},
        {'symmetry': 'C2', 'sequence': 'GSHMKNVLIGVQTNLGVNKTGTEFGPDDLIQAYPDTFDEMELISVERQKEDFNDKKLKFKNTVLDTCEKIAKRVNEAVIDGYRPILVGGDHSISLGSVSGVSLEKEIGVLWISAHGDMNTPESTLTGNIHGMPLALLQGLGDRELVNCFYEGAKLDSRNIVIFGAREIEVEERKIIEKTGVKIVYYDDILRKGIDNVLDEVKDYLKIDNLHISIDMNVFDPEIAPGVSVPVRRGMSYDEMFKSLKFAFKNYSVTSADITEFNPLNDINGKTAELVNGIVQYMMNPDY'},
        {'symmetry': 'C2', 'sequence': 'GGSGGSGGSGGSLFCEQVTTVTNLFEKWNDCERTVVMYALLKRLRYPSLKFLQYSIDSNLTQNLGTSQTNLSSVVIDINANNPVYLQNLLNAYKTARKEDILHEVLNMLPLLKPGNEEAKLIYLTLIPVAVKDTMQQIVPTELVQQIFSYLLIHPAITSEDRRSLNIWLRHLEDHIQ;SVPSYGEDELQQAMRLLNAASRQRTEAANEDFGGT'},
        {'symmetry': 'C3', 'sequence': 'LILNLRGGAFVSNTQITMADKQKKFINEIQEGDLVRSYSITDETFQQNAVTSIVKHEADQLCQINFGKQHVVCTVNHRFYDPESKLWKSVCPHPGSGISFLKKYDYLLSEEGEKLQITEIKTFTTKQPVFIYHIQVENNHNFFANGVLAHAMQVSI'},
    ]

Another dict case is showed as followed:

代码
文本
[2]
output_dir_base = "/data/prediction" #@param {type:"string"}
os.makedirs(output_dir_base, exist_ok=True)

input_json_path = 'your json file.json'


if os.path.isfile(input_json_path):
with open(input_json_path, encoding="utf-8") as fp:
input_json = json.load(fp)
default_list_case = False
default_dict_case = False
else: # A DEMO CASE (DICT). list case is above.
input_json = {
'7teu': {'sequence': 'MGSSHHHHHHSSGLVPRGSHMEDRDPTQFEERHLKFLQQLGKGNFGSVEMCRYDPLQDNTGEVVAVKKLQHSTEEHLRDFEREIEILKSLQHDNIVKYKGVCYSAGRRNLKLIMEYLPYGSLRDYLQKHKERIDHIKLLQYTSQICKGMEYLGTKRYIHRDLATRNILVENENRVKIGDFGLTKVLPQDKEFFKVKEPGESPIFWYAPESLTESKFSVASDVWSFGVVLYELFTYIEKSKSPPAEFMRMIGNDKQGQMIVFHLIELLKNNGRLPRPDGCPDEIYMIMTECWNNNVNQRPSFRDLALRVDQIRDNMAG'},
'8d27': {'symmetry': 'C2', 'sequence': 'GSHMKNVLIGVQTNLGVNKTGTEFGPDDLIQAYPDTFDEMELISVERQKEDFNDKKLKFKNTVLDTCEKIAKRVNEAVIDGYRPILVGGDHSISLGSVSGVSLEKEIGVLWISAHGDMNTPESTLTGNIHGMPLALLQGLGDRELVNCFYEGAKLDSRNIVIFGAREIEVEERKIIEKTGVKIVYYDDILRKGIDNVLDEVKDYLKIDNLHISIDMNVFDPEIAPGVSVPVRRGMSYDEMFKSLKFAFKNYSVTSADITEFNPLNDINGKTAELVNGIVQYMMNPDY'},
'8oij': {'symmetry': 'C2', 'sequence': 'GGSGGSGGSGGSLFCEQVTTVTNLFEKWNDCERTVVMYALLKRLRYPSLKFLQYSIDSNLTQNLGTSQTNLSSVVIDINANNPVYLQNLLNAYKTARKEDILHEVLNMLPLLKPGNEEAKLIYLTLIPVAVKDTMQQIVPTELVQQIFSYLLIHPAITSEDRRSLNIWLRHLEDHIQ;SVPSYGEDELQQAMRLLNAASRQRTEAANEDFGGT'},
'c2404': {'symmetry': 'C3', 'sequence': 'LILNLRGGAFVSNTQITMADKQKKFINEIQEGDLVRSYSITDETFQQNAVTSIVKHEADQLCQINFGKQHVVCTVNHRFYDPESKLWKSVCPHPGSGISFLKKYDYLLSEEGEKLQITEIKTFTTKQPVFIYHIQVENNHNFFANGVLAHAMQVSI'},
}


def process_batch_json(tasks, jobname):
if isinstance(tasks, dict):
new_tasks = []
for k, v in tasks.items():
v['id'] = k
new_tasks.append(v)
tasks = new_tasks
# check the input.
for idx, task in enumerate(tasks):
if 'id' not in task.keys():
task['id'] = idx
if 'sequence' not in task.keys():
raise KeyError(f"number {idx+1}-th 'sequence' not found in dict keys: {task.keys()} in json.")
target_id = f"{jobname}_{task['id']}"
input_sequences = task['sequence'].strip().split(';')
task['target_id'] = target_id
if 'symmetry' not in task.keys():
task['symmetry'] = 'C1'
symmetry_group = task['symmetry']
# check the sequences
sequences, is_multimer, symmetry_group = validate_input(
input_sequences=input_sequences,
symmetry_group=symmetry_group,
min_length=MIN_SINGLE_SEQUENCE_LENGTH,
max_length=MAX_SINGLE_SEQUENCE_LENGTH,
max_multimer_length=MAX_MULTIMER_LENGTH)
task['is_multimer'] = is_multimer
# save features to `output_dir_base`
feature_output_dir = get_features(
jobname=jobname,
target_id=target_id,
sequences=sequences,
output_dir_base=output_dir_base,
is_multimer=is_multimer,
msa_mode=msa_mode,
use_templates=use_templates
)
task['feature_output_dir'] = feature_output_dir
task['symmetry'] = task['symmetry'] if task['symmetry'] != 'C1' else None

return tasks


all_tasks = process_batch_json(input_json, jobname)
Using the single-chain model.
WARNING:absl:The exact sequence DPTQFEERHLKFLQQLGKGNFGSVEMCRYDPLQDGEVVAVKKLQHSTEEHLRDFEREIEILKSLQHDNIVKYKGVCYSAGRRNLKLIMEYLPYGSLRDYLQKHKERIDHIKLLQYTSQICKGMEYLGTKRYIHRDLATRNILVENENRVKIGDFGLTKVLPQDKEYYKVKEPGESPIFWYAPESLTESKFSVASDVWSFGVVLYELFTYIEKSKSPPAEFMRMIGNDKQGQMIVFHLIELLKNNGRLPRPDGCPDEIYMIMTECWNNNVNQRPSFRDLALRVDQIRDNMAG was not found in 7ll5_A. Realigning the template to the actual sequence.
WARNING:absl:The exact sequence GDPTQFEERHLKFLQQLGKGNFGSVEMCRYDPLQDNTGEVVAVKKLQHSTEEHLRDFEREIEILKSLQHDNIVKYKGVCYSGRRNLKLIMEYLPYGSLRDYLQKHKERIDHIKLLQYTSQICKGMEYLGTKRYIHRDLATRNILVENENRVKIGDFGLTKVLPQDKEYYKVKEPGESPIFWYAPESLTESKFSVASDVWSFGVVLYELFTYIEKSKSPPAEFMRMIGNDKQGQMIVFHLIELLKNNGRLPRPDGCPDEIYMIMTECWNNNVNQRPSFRDLALRVDQIRDNMAG was not found in 6vnk_A. Realigning the template to the actual sequence.
WARNING:absl:The exact sequence PTQFEERHLKFLQQLGKGNFGSVEMCRYDPLQDNTGEVVAVKKLQHSTEEHLRDFEREIEILKSLQHDNIVKYKGVCYSAGRNLKLIMEYLPYGSLRDYLQKHKERIDHIKLLQYTSQICKGMEYLGTKRYIHRDLATRNILVENENRVKIGDFGLTKVLPQDKEYYKVKEPGESPIFWYAPESLTESKFSVASDVWSFGVVLYELFTYIEKSKSPPAEFMRMIGNDKQGQMIVFHLIELLKNNGRLPRPDGCPDEIYMIMTECWNNNVNQRPSFRDLALRVDQIRDNMAG was not found in 5cf4_B. Realigning the template to the actual sequence.
WARNING:absl:The exact sequence DPTQFEERHLKFLQQLGKGFGSVEMCRYDPLQDNTGEVVAVKKLQHSTEEHLRDFEREIEILKSLQHDNIVKYKGVCYSAGRRNLKLIMEYLPYGSLRDYLQKHKERIDHIKLLQYTSQICKGMEYLGTKRYIHRDLATRNILVENENRVKIGDFGLTKVLPQDKEYYKVKEPGESPIFWYAPESLTESKFSVASDVWSFGVVLYELFTYIEKSKSPPAEFMRMIGNDKQGQMIVFHLIELLKNNGRLPRPDGCPDEIYMIMTECWNNNVNQRPSFRDLALRVDQIRDN was not found in 4d0w_A. Realigning the template to the actual sequence.
WARNING:absl:The exact sequence FEERHLKFLQQLGKGNFGSVEMCRYDPLQDNTGEVVAVKKLQHSTEEHLRDFEREIEILKSLQHDNIVKYKGVCYNLKLIMEYLPYGSLRDYLQKHKERIDHIKLLQYTSQICKGMEYLGTKRYIHRDLATRNILVENENRVKIGDFGLTKVLPQDKEYYKVKEPGESPIFWYAPESLTESKFSVASDVWSFGVVLYELFTYIEKSKSPPAEFMRMIGNDKQGQMIVFHLIELLKNNGRLPRPDGCPDEIYMIMTECWNNNVNQRPSFRDLALRVDQIRDNMAG was not found in 5cf8_A. Realigning the template to the actual sequence.
WARNING:absl:The exact sequence PTQFEERHLKFLRQLGKGNFGSVEMCRYDPLQDNTGEVVAVKKLQHSTEEHLRDFEREIEILKSLQHDNIVKYKGVCYNLKLIMEFLPYGSLREYLQKHKERIDHIKLLQYTSQICKGMEYLGTKRYIHRDLATRNILVENENRVKIGDFGLTKVLPQDKEYYKVKEPGESPIFWYAPESLTESKFSVASDVWSFGVVLYELFTYIEKSKSPPAEFMRMIGNDKQGQMIVFHLIELLKNNGRLPRPDGCPDEIYMIMTECWNNNVNQRPSFRDLALRVDQIRDNMAG was not found in 4e6d_B. Realigning the template to the actual sequence.
WARNING:absl:The exact sequence EERHLKFLQQLGKGNFGSVEMCRYDPLQDNTGEVVAVKKLQHSTEEHLRDFEREIEILKSLQHDNIVKYKGVCYSNLKLIMEYLPYGSLRDYLQKHKERIDHIKLLQYTSQICKGMEYLGTKRYIHRDLATRNILVENENRVKIGDFGLTKVLPQDKEYYKVKEPGESPIFWYAPESLTESKFSVASDVWSFGVVLYELFTYIEKSKSPPAEFMRMIGNDKQGQMIVFHLIELLKNNGRLPRPDGCPDEIYMIMTECWNNNVNQRPSFRDLALRVDQIRDNMAG was not found in 3tjd_A. Realigning the template to the actual sequence.
WARNING:absl:The exact sequence FEDRDPTQFEERHLKFLQQLGKGSVEMCRYDPLQDNTGEVVAVKKLQHSTEEHLRDFEREIEILKSLQHDNIVKYKGVCYSAGRRNLKLIMEYLPYGSLRDYLQKHKERIDHIKLLQYTSQICKGMEYLGTKRYIHRDLATRNILVENENRVKIGDFGLTKVLPQDKEYYKVKEPGESPIFWYAPESLTESKFSVASDVWSFGVVLYELFTYIEKSKSPPAEFMRMIGNDKQGQMIVFHLIELLKNNGRLPRPDGCPDEIYMIMTECWNNNVNQRPSFRDLALRVDQIRDN was not found in 5wev_A. Realigning the template to the actual sequence.
WARNING:absl:The exact sequence DPTQFEERHLKFLQQLGKGNFGSVEMCRYDPLQDNTGEVVAVKKLQHSTEEHLRDFEREIEILKSLQHDNIVKYKGVCYSAGRRNLKLIMEYLPYGSLRDYLQKHKERIDHIKLLQYTSQICKGMEYLGTKRYIHRDLATRNILVENENRVKIGDFGLTKVLPQDKEYYKVKESPIFWYAPESLTESKFSVASDVWSFGVVLYELFTYIEKSKSPPAEFMRMIGNDKQGQMIVFHLIELLKNNGRLPRPDGCPDEIYMIMTECWNNNVNQRPSFRDLALRVDQIRDNM was not found in 5usy_B. Realigning the template to the actual sequence.
WARNING:absl:The exact sequence EERHLKFLQQLGKGNFGSVEMCRYDPLQDNTGEVVAVKKLQHSTEEHLRDFEREIEILKSLQHDNIVKYKGVCYSNLKLIMEYLPYGSLRDYLQKHKERIDHIKLLQYTSQICKGMEYLGTKRYIHRDLATRNILVENENRVKIGDFGLTKVLPQDKEYYKVKEPGESPIFWYAPESLTESKFSVASDVWSFGVVLYELFTYIEKSKSPPAEFMRMIGNDKQGQMIVFHLIELLKNNGRLPRPDGCPDEIYMIMTECWNNNVNQRPSFRDLALRVDQIRDQMAG was not found in 2b7a_A. Realigning the template to the actual sequence.
WARNING:absl:The exact sequence QFEERHLKFLQQLGKGNFGSVEMCRYDPLQDNTGEVVAVKKLQHSTEEHLRDFEREIEILKSLQHDNIVKYKGVCYSAGRRNLKLIMEYLPYGSLRDYLQKHKERIDHIKLLQYTSQICKGMEYLGTKRYIHRDLATRNILVENENRVKIGDFGLTKVLPQDKEYYKVSPIFWYAPESLTESKFSVASDVWSFGVVLYELFTYIEKSKSPPAEFMRMIGNDKQGQMIVFHLIELLKNNGRLPRPDGCPDEIYMIMTECWNNNVNQRPSFRDLALRVDQIRDNMAG was not found in 3rvg_A. Realigning the template to the actual sequence.
Using UF-Symmetry with group C2. If you do not want to use UF-Symmetry, please use `C1` and copy the AU sequences to the count in the assembly.
WARNING:absl:The exact sequence KEISVIGVPMDLGQMRRGVDMGPSAIRYAGVIERIEEIGYDVKDMGDICIENTKLRNLTQVATVCNELASKVDHIIEEGRFPLVLGGDHSIAIGTLAGVAKHYKNLGVIWYDAHGDLNTEETSPSGNIHGMSLAASLGYGHSSLVDLYGAYPKVKKENVVIIGARALDEGEKDFIRNEGIKVFSMHEIDRMGMTAVMEETIAYLSHTDGVHLSLDLDGLDPHDAPGVGTPVIGGLSYRESHLAMEMLAEADIITSAEFVEVNTILDERNRTATTAVALMGSLFGE was not found in 6nbk_D. Realigning the template to the actual sequence.
WARNING:absl:The exact sequence KEISVIGVPMDLGQMRRGVDMGPSAIRYAGVIERIEEIGYDVKDMGDICINTKLRNLTQVATVCNELASKVDHIIEEGRFPLVLGGDHSIAIGTLAGVAKHYKNLGVIWYDAHGDLNTEETSPSGNIHGMSLAASLGYGHSSLVDLYGAYPKVKKENVVIIGARALDEGEKDFIRNEGIKVFSMHEIDRMGMTAVMEETIAYLSHTDGVHLSLDLDGLDPHDAPGVGTPVIGGLSYRESHLAMEMLAEADIITSAEFVEVNTILDERNRTATTAVALMGSLFGE was not found in 6nbk_C. Realigning the template to the actual sequence.
WARNING:absl:The exact sequence KEISVIGVPMDLGQMRRGVDMGPSAIRYAGVIERIEEIGYDVKDMGDICIEENTKLRNLTQVATVCNELASKVDHIIEEGRFPLVLGGDHSIAIGTLAGVAKHYKNLGVIWYDAHGDLNTEETSPSGNIHGMSLAASLGYGHSSLVDLYGAYPKVKKENVVIIGARALDEGEKDFIRNEGIKVFSMHEIDRMGMTAVMEETIAYLSHTDGVHLSLDLDGLDPHDAPGVGTPVIGGLSYRESHLAMEMLAEADIITSAEFVEVNTILDERNRTATTAVALMGSLFGE was not found in 6nbk_A. Realigning the template to the actual sequence.
WARNING:absl:The exact sequence DKTISVIGMPMDLGQARRGVDMGPSAIRYAHLIERLSDMGYTVEDLGDIPINELKNLNSVLAGNEKLAQKVNKVIEEKKFPLVLGGDHSIAIGTLAGTAKHYDNLGVIWYDAHGDLNTLETSPSGNIHGMPLAVSLGIGHESLVNLEGYAPKIKPENVVIIGARSLDEGERKYIKESGMKVYTMHEIDRLGMTKVIEETLDYLSACDGVHLSLDLDGLDPNDAPGVGTPVVGGISYRESHLAMEMLYDAGIITSAEFVEVNPILDHKNKTGKTAVELVESLLGK was not found in 6nfp_E. Realigning the template to the actual sequence.
WARNING:absl:The exact sequence DKTISVIGMPMDLGQARRGVDMGPSAIRYAHLIERLSDMGYTVEDLGDIPINREELKNLNSVLAGNEKLAQKVNKVIEEKKFPLVLGGDHSIAIGTLAGTAKHYDNLGVIWYDAHGDLNTLETSPSGNIHGMPLAVSLGIGHESLVNLEGYAPKIKPENVVIIGARSLDEGERKYIKESGMKVYTMHEIDRLGMTKVIEETLDYLSACDGVHLSLDLDGLDPNDAPGVGTPVVGGISYRESHLAMEMLYDAGIITSAEFVEVNPILDHKNKTGKTAVELVESLLGK was not found in 6nfp_F. Realigning the template to the actual sequence.
WARNING:absl:The exact sequence KTISVIGMPMDLGQARRGVDMGPSAIRYAHLIERLSDMGYTVEDLGDIPINREDEELKNLNSVLAGNEKLAQKVNKVIEEKKFPLVLGGDHSIAIGTLAGTAKHYDNLGVIWYDAHGDLNTLETSPSGNIHGMPLAVSLGIGHESLVNLEGYAPKIKPENVVIIGARSLDEGERKYIKESGMKVYTMHEIDRLGMTKVIEETLDYLSACDGVHLSLDLDGLDPNDAPGVGTPVVGGISYRESHLAMEMLYDAGIITSAEFVEVNPILDHKNKTGKTAVELVESLLGK was not found in 6nfp_C. Realigning the template to the actual sequence.
WARNING:absl:The exact sequence KTISVIGMPMDLGQARRGVDMGPSAIRYAHLIERLSDMGYTVEDLGDIPINREKIDEELKNLNSVLAGNEKLAQKVNKVIEEKKFPLVLGGDHSIAIGTLAGTAKHYDNLGVIWYDAHGDLNTLETSPSGNIHGMPLAVSLGIGHESLVNLEGYAPKIKPENVVIIGARSLDEGERKYIKESGMKVYTMHEIDRLGMTKVIEETLDYLSACDGVHLSLDLDGLDPNDAPGVGTPVVGGISYRESHLAMEMLYDAGIITSAEFVEVNPILDHKNKTGKTAVELVESLLGKK was not found in 6nfp_A. Realigning the template to the actual sequence.
WARNING:absl:The exact sequence KTISVIGMPMDLGQARRGVDMGPSAIRYAHLIERLSDMGYTVEDLGDIPINNLNSVLAGNEKLAQKVNKVIEEKKFPLVLGGDHSIAIGTLAGTAKHYDNLGVIWYDAHGDLNTLETSPSGNIHGMPLAVSLGIGHESLVNLEGYAPKIKPENVVIIGARSLDEGERKYIKESGMKVYTMHEIDRLGMTKVIEETLDYLSACDGVHLSLDLDGLDPNDAPGVGTPVVGGISYRESHLAMEMLYDAGIITSAEFVEVNPILDHKNKTGKTAVELVESLLGK was not found in 6dkt_D. Realigning the template to the actual sequence.
WARNING:absl:The exact sequence KTISVIGMPMDLGQARRGVDMGPSAIRYAHLIERLSDMGYTVEDLGDIPINNLNSVLAGNEKLAQKVNKVIEEKKFPLVLGGDHSIAIGTLAGTAKHYDNLGVIWYDAHGDLNTLESGNIHGMPLAVSLGIGHESLVNLEGYAPKIKPENVVIIGARSLDEGERKYIKESGMKVYTMHEIDRLGMTKVIEETLDYLSACDGVHLSLDLDGLDPNDAPGVGTPVVGGISYRESHLAMEMLYDAGIITSAEFVEVNPILDHKNKTGKTAVELVESLLGK was not found in 6dkt_F. Realigning the template to the actual sequence.
WARNING:absl:The exact sequence RVAVVGVPMDLGANRRGVDMGPSALRYARLLEQLEDLGYTVEDLGDVPVSLARLAYLEEIRAAALVLKERLAALPEGVFPIVLGGDHSLSMGSVAGAARGRRVGVVWVDAHADFNTPETSPSGNVHGMPLAVLSGLGHPRLTEVFRAVDPKDVVLVGVRSLDPGEKRLLKEAGVRVYTMHEVDRLGVARIAEEVLKHLQGLPLHVSLDADVLDPTLAPGVGTPVPGGLTYREAHLLMEILAESGRVQSLDLVEVNPILDERNRTAEMLVGLALSLLGKR was not found in 2ef4_A. Realigning the template to the actual sequence.
WARNING:absl:The exact sequence RVAVVGVPMDLGVDMGPSALRYARLLEQLEDLGYTVEDLGDVPVSLAYLEEIRAAALVLKERLAALPEGVFPIVLGGDHSLSMGSVAGAARGRRVGVVWVDAHADFNTPETSSGNVHGMPLAVLSGLGHPRLTEVFRAVDPKDVVLVGVRSLDPGEKRLLKEAGVRVYTMHEVDRLGVARIAEEVLKHLQGLPLHVSLDADVLDPTLAPGVGTPVPGGLTYREAHLLMEILAESGRVQSLDLVEVNPILDERNRTAEMLVGLALSLLGKR was not found in 2eiv_M. Realigning the template to the actual sequence.
Using UF-Symmetry with group C2. If you do not want to use UF-Symmetry, please use `C1` and copy the AU sequences to the count in the assembly.
Using UF-Symmetry with group C3. If you do not want to use UF-Symmetry, please use `C1` and copy the AU sequences to the count in the assembly.
WARNING:absl:The exact sequence CLAEGTRIFDPVTGTTHRIEDVVDGRKPIHVVAAAKDGTLHARPVVSWFDQGTRDVIGLRIAGGAILWATPDHKVLTEYGWRAAGELRKGDRVAQPRRFDGFMLAEELRYSVIREVLPTRRARTFDLEVEELHTLVAEGVVVH was not found in 2imz_A. Realigning the template to the actual sequence.
WARNING:absl:The exact sequence CLAEGTRIFDPVTGTTHRIEDVVDGRKPIHVVAAAKDGTLHARPVVSWFDQGTRDVIGLRIAGGAILWATPDHKVLTEYGWRAAGELRKGDRVAQPRRFDGFEELRYSVIREVLPTRRARTFDLEVEELHTLVAEGVVVH was not found in 2imz_B. Realigning the template to the actual sequence.
代码
文本
[3]
#@title Uni-Fold prediction on GPU.
import time
from tqdm import tqdm
from unifold.colab.model import colab_inference

def manual_operations():
# developers may operate on the pickle files here
# to customize the features for inference.
pass

manual_operations()

for task in tqdm(all_tasks, desc='running Unifold'):
start = time.time()
best_result = colab_inference(
target_id=task['target_id'],
data_dir=task['feature_output_dir'],
param_dir='/root/Uni-Fold/models/',
output_dir=task['feature_output_dir'],
symmetry_group=task['symmetry'],
is_multimer=task['is_multimer'],
max_recycling_iters=max_recycling_iters,
num_ensembles=num_ensembles,
times=times,
manual_seed=manual_seed,
device="cuda:0", # do not change this on colab.
)
task['best_plddt'] = best_result['plddt'].mean().item()
task['pae'] = best_result['pae'].mean().item() if best_result['pae'] is not None else None
task['best_results_path'] = best_result['best_results_path']
task['protein'] = best_result['protein']
task['run_time'] = time.time() - start
print(f"total time: {time.time() - start}")

running Unifold:   0%|          | 0/4 [00:00<?, ?it/s]start to load params /root/Uni-Fold/models/monomer.unifold.pt
start to predict unifold_batch_colab_7teu
{'aatype': torch.Size([1, 1, 317]), 'residue_index': torch.Size([1, 1, 317]), 'seq_length': torch.Size([1, 1]), 'msa_chains': torch.Size([8, 1, 508, 1]), 'template_aatype': torch.Size([1, 1, 4, 317]), 'template_all_atom_mask': torch.Size([1, 1, 4, 317, 37]), 'template_all_atom_positions': torch.Size([1, 1, 4, 317, 37, 3]), 'bert_mask': torch.Size([8, 1, 508, 317]), 'msa_mask': torch.Size([8, 1, 508, 317]), 'num_recycling_iters': torch.Size([1, 1]), 'is_distillation': torch.Size([8, 1]), 'seq_mask': torch.Size([1, 1, 317]), 'msa_row_mask': torch.Size([8, 1, 508]), 'template_mask': torch.Size([1, 1, 4]), 'template_pseudo_beta': torch.Size([1, 1, 4, 317, 3]), 'template_pseudo_beta_mask': torch.Size([1, 1, 4, 317]), 'template_torsion_angles_sin_cos': torch.Size([1, 1, 4, 317, 7, 2]), 'template_alt_torsion_angles_sin_cos': torch.Size([1, 1, 4, 317, 7, 2]), 'template_torsion_angles_mask': torch.Size([1, 1, 4, 317, 7]), 'residx_atom14_to_atom37': torch.Size([1, 1, 317, 14]), 'residx_atom37_to_atom14': torch.Size([1, 1, 317, 37]), 'atom14_atom_exists': torch.Size([1, 1, 317, 14]), 'atom37_atom_exists': torch.Size([1, 1, 317, 37]), 'target_feat': torch.Size([1, 1, 317, 22]), 'extra_msa': torch.Size([8, 1, 1024, 317]), 'extra_msa_mask': torch.Size([8, 1, 1024, 317]), 'extra_msa_row_mask': torch.Size([8, 1, 1024]), 'true_msa': torch.Size([8, 1, 508, 317]), 'extra_msa_has_deletion': torch.Size([8, 1, 1024, 317]), 'extra_msa_deletion_value': torch.Size([8, 1, 1024, 317]), 'msa_feat': torch.Size([8, 1, 508, 317, 49])}
Inference time: 75.343291413
running Unifold:  25%|██▌       | 1/4 [01:28<04:26, 88.73s/it]plddts {'monomer.unifold.pt_97923': '0.9176682'}
total time: 88.72839856147766
start to load params /root/Uni-Fold/models/uf_symmetry.pt
start to predict unifold_batch_colab_8d27
{'aatype': torch.Size([1, 1, 287]), 'residue_index': torch.Size([1, 1, 287]), 'seq_length': torch.Size([1, 1]), 'msa_chains': torch.Size([8, 1, 252, 1]), 'template_aatype': torch.Size([1, 1, 4, 287]), 'template_all_atom_mask': torch.Size([1, 1, 4, 287, 37]), 'template_all_atom_positions': torch.Size([1, 1, 4, 287, 37, 3]), 'asym_id': torch.Size([1, 1, 287]), 'sym_id': torch.Size([1, 1, 287]), 'entity_id': torch.Size([1, 1, 287]), 'num_sym': torch.Size([1, 1, 287]), 'assembly_num_chains': torch.Size([1, 1, 1]), 'cluster_bias_mask': torch.Size([1, 1, 252]), 'bert_mask': torch.Size([8, 1, 252, 287]), 'msa_mask': torch.Size([8, 1, 252, 287]), 'asym_len': torch.Size([1, 1, 1]), 'num_recycling_iters': torch.Size([1, 1]), 'is_distillation': torch.Size([8, 1]), 'seq_mask': torch.Size([1, 1, 287]), 'msa_row_mask': torch.Size([8, 1, 252]), 'template_mask': torch.Size([1, 1, 4]), 'template_pseudo_beta': torch.Size([1, 1, 4, 287, 3]), 'template_pseudo_beta_mask': torch.Size([1, 1, 4, 287]), 'template_torsion_angles_sin_cos': torch.Size([1, 1, 4, 287, 7, 2]), 'template_alt_torsion_angles_sin_cos': torch.Size([1, 1, 4, 287, 7, 2]), 'template_torsion_angles_mask': torch.Size([1, 1, 4, 287, 7]), 'residx_atom14_to_atom37': torch.Size([1, 1, 287, 14]), 'residx_atom37_to_atom14': torch.Size([1, 1, 287, 37]), 'atom14_atom_exists': torch.Size([1, 1, 287, 14]), 'atom37_atom_exists': torch.Size([1, 1, 287, 37]), 'target_feat': torch.Size([1, 1, 287, 22]), 'extra_msa': torch.Size([8, 1, 1152, 287]), 'extra_msa_mask': torch.Size([8, 1, 1152, 287]), 'extra_msa_row_mask': torch.Size([8, 1, 1152]), 'true_msa': torch.Size([8, 1, 252, 287]), 'msa_feat': torch.Size([8, 1, 252, 287, 49]), 'extra_msa_has_deletion': torch.Size([8, 1, 1152, 287]), 'extra_msa_deletion_value': torch.Size([8, 1, 1152, 287]), 'symmetry_opers': torch.Size([1, 1, 2, 4, 4]), 'pseudo_residue_feat': torch.Size([1, 1, 8]), 'num_asym': torch.Size([1, 1])}
Inference time: 40.910752782
plddts {'uf_symmetry.pt_97923': '0.9217122'}
running Unifold:  50%|█████     | 2/4 [02:16<02:09, 64.90s/it]total time: 48.21156907081604
start to load params /root/Uni-Fold/models/uf_symmetry.pt
start to predict unifold_batch_colab_8oij
{'aatype': torch.Size([1, 1, 212]), 'residue_index': torch.Size([1, 1, 212]), 'seq_length': torch.Size([1, 1]), 'msa_chains': torch.Size([8, 1, 252, 1]), 'template_aatype': torch.Size([1, 1, 4, 212]), 'template_all_atom_mask': torch.Size([1, 1, 4, 212, 37]), 'template_all_atom_positions': torch.Size([1, 1, 4, 212, 37, 3]), 'asym_id': torch.Size([1, 1, 212]), 'sym_id': torch.Size([1, 1, 212]), 'entity_id': torch.Size([1, 1, 212]), 'num_sym': torch.Size([1, 1, 212]), 'assembly_num_chains': torch.Size([1, 1, 1]), 'cluster_bias_mask': torch.Size([1, 1, 252]), 'bert_mask': torch.Size([8, 1, 252, 212]), 'msa_mask': torch.Size([8, 1, 252, 212]), 'asym_len': torch.Size([1, 1, 2]), 'num_recycling_iters': torch.Size([1, 1]), 'is_distillation': torch.Size([8, 1]), 'seq_mask': torch.Size([1, 1, 212]), 'msa_row_mask': torch.Size([8, 1, 252]), 'template_mask': torch.Size([1, 1, 4]), 'template_pseudo_beta': torch.Size([1, 1, 4, 212, 3]), 'template_pseudo_beta_mask': torch.Size([1, 1, 4, 212]), 'template_torsion_angles_sin_cos': torch.Size([1, 1, 4, 212, 7, 2]), 'template_alt_torsion_angles_sin_cos': torch.Size([1, 1, 4, 212, 7, 2]), 'template_torsion_angles_mask': torch.Size([1, 1, 4, 212, 7]), 'residx_atom14_to_atom37': torch.Size([1, 1, 212, 14]), 'residx_atom37_to_atom14': torch.Size([1, 1, 212, 37]), 'atom14_atom_exists': torch.Size([1, 1, 212, 14]), 'atom37_atom_exists': torch.Size([1, 1, 212, 37]), 'target_feat': torch.Size([1, 1, 212, 22]), 'extra_msa': torch.Size([8, 1, 336, 212]), 'extra_msa_mask': torch.Size([8, 1, 336, 212]), 'extra_msa_row_mask': torch.Size([8, 1, 336]), 'true_msa': torch.Size([8, 1, 252, 212]), 'msa_feat': torch.Size([8, 1, 252, 212, 49]), 'extra_msa_has_deletion': torch.Size([8, 1, 336, 212]), 'extra_msa_deletion_value': torch.Size([8, 1, 336, 212]), 'symmetry_opers': torch.Size([1, 1, 2, 4, 4]), 'pseudo_residue_feat': torch.Size([1, 1, 8]), 'num_asym': torch.Size([1, 1])}
running Unifold:  75%|███████▌  | 3/4 [02:43<00:47, 47.28s/it]Inference time: 21.187872653
plddts {'uf_symmetry.pt_97923': '0.8547156'}
total time: 26.32280707359314
start to load params /root/Uni-Fold/models/uf_symmetry.pt
start to predict unifold_batch_colab_c2404
{'aatype': torch.Size([1, 1, 156]), 'residue_index': torch.Size([1, 1, 156]), 'seq_length': torch.Size([1, 1]), 'msa_chains': torch.Size([8, 1, 252, 1]), 'template_aatype': torch.Size([1, 1, 4, 156]), 'template_all_atom_mask': torch.Size([1, 1, 4, 156, 37]), 'template_all_atom_positions': torch.Size([1, 1, 4, 156, 37, 3]), 'asym_id': torch.Size([1, 1, 156]), 'sym_id': torch.Size([1, 1, 156]), 'entity_id': torch.Size([1, 1, 156]), 'num_sym': torch.Size([1, 1, 156]), 'assembly_num_chains': torch.Size([1, 1, 1]), 'cluster_bias_mask': torch.Size([1, 1, 252]), 'bert_mask': torch.Size([8, 1, 252, 156]), 'msa_mask': torch.Size([8, 1, 252, 156]), 'asym_len': torch.Size([1, 1, 1]), 'num_recycling_iters': torch.Size([1, 1]), 'is_distillation': torch.Size([8, 1]), 'seq_mask': torch.Size([1, 1, 156]), 'msa_row_mask': torch.Size([8, 1, 252]), 'template_mask': torch.Size([1, 1, 4]), 'template_pseudo_beta': torch.Size([1, 1, 4, 156, 3]), 'template_pseudo_beta_mask': torch.Size([1, 1, 4, 156]), 'template_torsion_angles_sin_cos': torch.Size([1, 1, 4, 156, 7, 2]), 'template_alt_torsion_angles_sin_cos': torch.Size([1, 1, 4, 156, 7, 2]), 'template_torsion_angles_mask': torch.Size([1, 1, 4, 156, 7]), 'residx_atom14_to_atom37': torch.Size([1, 1, 156, 14]), 'residx_atom37_to_atom14': torch.Size([1, 1, 156, 37]), 'atom14_atom_exists': torch.Size([1, 1, 156, 14]), 'atom37_atom_exists': torch.Size([1, 1, 156, 37]), 'target_feat': torch.Size([1, 1, 156, 22]), 'extra_msa': torch.Size([8, 1, 1152, 156]), 'extra_msa_mask': torch.Size([8, 1, 1152, 156]), 'extra_msa_row_mask': torch.Size([8, 1, 1152]), 'true_msa': torch.Size([8, 1, 252, 156]), 'msa_feat': torch.Size([8, 1, 252, 156, 49]), 'extra_msa_has_deletion': torch.Size([8, 1, 1152, 156]), 'extra_msa_deletion_value': torch.Size([8, 1, 1152, 156]), 'symmetry_opers': torch.Size([1, 1, 3, 4, 4]), 'pseudo_residue_feat': torch.Size([1, 1, 8]), 'num_asym': torch.Size([1, 1])}
running Unifold: 100%|██████████| 4/4 [03:02<00:00, 45.59s/it]Inference time: 13.517628150000007
plddts {'uf_symmetry.pt_97923': '0.91203463'}
total time: 19.09130835533142

代码
文本
[4]
task_best_proteins = []
with open(os.path.join(output_dir_base, 'all_tasks_summary.json'), 'w') as f:
# remove the protein for clean resluts config.
for item in all_tasks:
if 'protein' in item:
protein = item.pop('protein')
task_best_proteins.append({'id':item['id'], 'protein': protein})
json.dump(all_tasks, f, indent=2)
代码
文本
[ ]

代码
文本
Uni-Fold
Uni-Fold
已赞2
本文被以下合集收录
Unifold
HaoLi
更新于 2024-04-01
1 篇0 人关注
推荐阅读
公开
Uni-Fold Notebook
Uni-FoldPyTorch
Uni-FoldPyTorch
我是地球人
发布于 2023-07-13
5 赞24 转存文件2 评论
公开
DiffDock Notebook
PyTorch
PyTorch
我是地球人
发布于 2023-07-19