Improving the Prediction of Protein Stability Changes Upon Mutations by Geometric Learning and a Pre-Training Strategy

Yunxin Xu,Di Liu,Haipeng Gong
DOI: https://doi.org/10.1101/2023.05.28.542668
2024-01-01
Nature Computational Science
Abstract:A bstract Accurate prediction of the fitness and stability of a protein upon mutations is of high importance in protein engineering and design. Despite the rapid development of deep learning techniques and accumulation of experimental data, the multi-labeled nature of fitness data hinders the training of robust deep-learning-based models for the fitness and stability prediction tasks. Here, we propose three geometric-learning-based models, GeoFitness, GeoDDG and GeoDTm, for the prediction of the fitness score, ΔΔ G and Δ T m of a protein upon mutations, respectively. In the optimization of GeoFitness, we designed a novel loss function to allow supervised training of a unified model using the large amount of multi-labeled fitness data in the deep mutational scanning (DMS) database. By this means, GeoFitness efficiently learns the general functional effects of protein mutations and achieves better performance over the other state-of-the-art methods. To further improve the downstream tasks of ΔΔ G /Δ T m prediction, we re-utilized the encoder of GeoFitness as a pre-trained module in GeoDDG and GeoDTm to overcome the challenge of lack of sufficient amount of specifically labeled data. This pre-training strategy in combination with data expansion remarkably improves model performance and generalizability. When evaluated on the benchmark test sets (S669 for ΔΔ G prediction and a newly collected set S571 for Δ T m prediction), GeoDDG and GeoDTm outperform the other state-of-the-art methods by at least 30% and 70%, respectively, in terms of the Spearman correlation coefficient between predicted and experimental values. An online server for the suite of these three predictors, GeoStab-suite, is available at http://structpred.life.tsinghua.edu.cn/server_geostab.html .
What problem does this paper attempt to address?