Effective High-Accuracy Prediction of Protein Structures from Easily Obtainable Artificial Homologous Sequences by Structure-Stability-Based Selection

Jinle Tang,Zhe Zhang,Jian Zhan,Yaoqi Zhou
DOI: https://doi.org/10.1101/2023.11.22.568372
2024-11-13
Abstract:High-resolution protein structure determination by experimental techniques is notoriously costly and labor intensive. This problem is mostly solved with arrival of deep-learning-based computational prediction by AlphaFold2 but only for those proteins with enough naturally occurring homologous sequences. Here, we attempt to close the remaining gap by employing artificially generated, structure-stability-selected homologous sequences as an input for AlphaFold2. We showed that only one round of selection of deeply mutated sequences of a few mutations is sufficient to bring the accuracy of predicted structures to better than 2 Å RMSD from their respective native structures for four of the five proteins experimented. The performance for three out of five proteins is even better than AlphaFold2 with naturally occurring sequences. The only protein with predicted structure of >2 Å (at 2.92 Å) RMSD is due to a fully exposed (i.e., likely flexible) β-hairpin. The result supports a future of determining protein structures at low cost and fast turnaround by integrating simple molecular biology experiments (deep mutational scanning and in vivo or in vitro selection) with high-throughput sequencing. The technique proposed here can be further extended to predict structures of protein complexes as well as proteins with posttranslational modifications.
Molecular Biology
What problem does this paper attempt to address?