Structure-based self-supervised learning enables ultrafast prediction of stability changes upon mutation at the protein universe scale

Jinyuan Sun,Tong Zhu,Yinglu Cui,Bian Wu,Sun,J.,Zhu,T.,Cui,Y.,Wu,B.
DOI: https://doi.org/10.1101/2023.08.09.552725
2023-08-14
bioRxiv
Abstract:Predicting free energy changes ({triangleup}{triangleup}G) is of paramount significance in advancing our comprehension of protein evolution and holds profound implications for protein engineering and pharmaceutical development. Traditional methods, however, often suffer from limitations such as sluggish computational speed or heavy reliance on biased training datasets. These challenges are magnified when aiming for accurate {triangleup}{triangleup}G prediction across the vast universe of protein sequences. In this study, we present Pythia, a self-supervised graph neural network tailored for zero-shot {triangleup}{triangleup}G predictions. In comparative benchmarks with other self-supervised pre-training models and force field-based methods, Pythia outshines its contenders with superior correlations while operating with the fewest parameters, and exhibits a remarkable acceleration in computational speed, up to 105-fold. The efficacy of Pythia is corroborated through its application in predicting thermostable mutations of limonene epoxide hydrolase (LEH) with significant higher experimental success rates. This efficiency propels the exploration of 26 million high-quality protein structures. Such a grand-scale application signifies a leap forward in our capacity to traverse the protein sequence space and potentially enrich our insights into the intricacies of protein genotype-phenotype relationships. We provided a web app at https://pythia.wulab.xyz for users to conveniently execute predictions.
What problem does this paper attempt to address?