Large scale analysis of predicted protein structures links model features to behaviour

Michael J. Stam,Diego A. Oyarzún,Nadanai Laohakunakorn,Christopher W. Wood
DOI: https://doi.org/10.1101/2024.04.10.588835
2024-04-14
Abstract:Rapid advancements in protein structure prediction methods have ushered in a new era of abundant and accurate structural data, providing opportunities to analyse proteins at a scale that has not been possible before. Here we show that features derived solely from predicted structures can be used to understand protein behaviour using data-driven methods. We found that these features were predictive of protein production for a set of designed antibodies, enabling identification of high-quality designs. Following on from this result, we calculated these features for a diverse set of ≈500,000 predicted structures, and our analysis showed systematic variation between proteins from different organisms to such an extent that the tree of life could be recapitulated from these data. Given the high degree of functional constraint around the chemistry of proteins, this result is surprising, and could have important implications for the design and engineering of novel proteins.
Bioinformatics
What problem does this paper attempt to address?