De novo Protein Sequence Design Based on Deep Learning and Validation on CalB Hydrolase

Junxi Mu,Zhenxin Li,Bo Zhang,Qi Zhang,Jamshed Iqbal,Abdul Wadood,Ting Wei,Yan Feng,Haifeng Chen,Mu,J.,Li,Z.,Zhang,B.,Zhang,Q.,Iqbal,J.,Wadood,A.,Wei,T.,Feng,Y.,Chen,H.
DOI: https://doi.org/10.1101/2023.08.01.551444
2023-08-02
bioRxiv
Abstract:Protein design is central to nearly all protein engineering problems, as it can enable the creation of proteins with new biological function, such as improving the catalytic efficiency of enzymes. As one of the key tasks of protein design, fixed-backbone protein sequence design aims to design novel sequence that would fold into a given protein backbone structure. However, current sequence design methods have limitations in terms of low sequence diversity and experimental validation of designed protein function, which cannot meet the needs of functional protein design. We firstly constructed Graphormer-based Protein Design (GPD) model that directly applies Transformer to graph-based representation of 3D protein structure, and added Gaussian noise and sequence random mask to node features to improve the sequence recovery and diversity. Additionally, functional filtering based on the structure folding, solubility, and function were performed to improve the success rate in experiments. The process of "sequence design-functional filtering -functional experiment" was carried out for CalB hydrolase. The experimental results showed that the specify activity of designed protein improved 1.7 times than CalB wild type. This design and filtering platform will be a valuable tool for generating industrial enzymes and protein drugs with specific functions.
What problem does this paper attempt to address?