Protein Sequence Design by Entropy-based Iterative Refinement

Xinyi Zhou,Guangyong Chen,Junjie Ye,Ercheng Wang,Jun Zhang,Cong Mao,Zhanwei Li,Jianye Hao,Xingxu Huang,Jin Tang,Pheng Ann Heng
DOI: https://doi.org/10.1101/2023.02.04.527099
2023-01-01
Abstract:Inverse Protein Folding (IPF) is an important task of protein design, which aims to design sequences compatible with a given backbone structure. Despite the prosperous development of algorithms for this task, existing methods tend to leverage limited and noisy residue environment when generating sequences. In this paper, we develop an iterative sequence refinement pipeline, which can refine the sequence generated by existing sequence design models. It selects and retains reliable predictions based on the model’s confidence in predicted distributions, and decodes the residue type based on a partially visible environment. The proposed scheme can consistently improve the performance of a number of IPF models on several sequence design benchmarks, and increase sequence recovery of the SOTA model by up to 10%. We finally show that the proposed model can be applied to redesign Transposon-associated transposase B. 8 variants exhibit improved gene editing activity among the 20 variants we proposed. Our code and a demo of the refinement pipeline are provided in the online colab. ### Competing Interest Statement The authors have declared no competing interest.
What problem does this paper attempt to address?