Protein superfolds are characterised as frustration-free topologies: A case study of pure parallel -sheet topologies

Hiroto Murata,Kazuma Toko,George Chikenji
DOI: https://doi.org/10.1101/2024.01.05.574326
2024-01-05
Abstract:A protein superfold is a type of protein fold that is observed in at least three distinct, non-homologous protein families. Structural classification studies have revealed a limited number of prevalent superfolds alongside several infrequent occurring folds, and in type superfolds, the C-terminal -strand tends to favor the edge of the -sheet, while the N-terminal -strand is often found in the middle. The reasons behind these observations, whether they are due to evolutionary sampling bias or physical interactions, remain unclear. This article offers a physics-based explanation for these observations, specifically for pure parallel -sheet topologies. Our investigation is grounded in three established structural rules that are based on physical interactions. We have identified “frustration-free topologies” which are topologies that can satisfy all three rules simultaneously. In contrast, topologies that cannot are termed “frustrated topologies.” Our findings reveal that frustration-free topologies represent only a fraction of all theoretically possible patterns, these topologies strongly favor positioning the C-terminal -strand at the edge of the -sheet and the N-terminal -strand in the middle, and there is significant overlap between frustration-free topologies and superfolds. We also used a lattice protein model to thoroughly investigate sequence-structure relationships. Our results show that frustration-free structures are highly designable, while frustrated structures are poorly designable. These findings suggest that superfolds are highly designable due to their lack of frustration, and the preference for positioning C-terminal -strands at the edge of the -sheet is a direct result of frustration-free topologies. These insights not only enhance our understanding of sequence-structure relationships but also have significant implications for de novo protein design.
Bioinformatics
What problem does this paper attempt to address?