Improving the Prediction of Protein Stability Changes Upon Mutations by Geometric Learning and a Pre-Training Strategy

Yunxin Xu,Di Liu,Haipeng Gong
DOI: https://doi.org/10.1038/s43588-024-00716-2
2024-01-01
Nature Computational Science
Abstract:Accurate prediction of protein mutation effects is of great importance in protein engineering and design. Here we propose GeoStab-suite, a suite of three geometric learning-based models-GeoFitness, GeoDDG and GeoDTm-for the prediction of fitness score, Delta Delta G and Delta Tm of a protein upon mutations, respectively. GeoFitness engages a specialized loss function to allow supervised training of a unified model using the large amount of multi-labeled fitness data in the deep mutational scanning database. To further improve the downstream tasks of Delta Delta G and Delta Tm prediction, the encoder of GeoFitness is reutilized as a pre-trained module in GeoDDG and GeoDTm to overcome the challenge of lacking sufficient labeled data. This pre-training strategy, in combination with data expansion, markedly improves model performance and generalizability. In the benchmark test, GeoDDG and GeoDTm outperform the other state-of-the-art methods by at least 30% and 70%, respectively, in terms of the Spearman correlation coefficient. In this study, the authors propose a strategy to train a unified model to learn the general mutational effects based on multi-labeled deep mutational scanning (DMS) data, and then reutilize this pre-trained model to improve the downstream protein stability prediction tasks.
What problem does this paper attempt to address?