Comparison and evaluation of data-driven protein stability prediction models

Jennifer A. Csicsery-Ronay,Alexander Zaitzeff,Jedediah M. Singer
DOI: https://doi.org/10.1101/2022.03.14.483859
2022-03-16
Abstract:Abstract Predicting protein stability is important to protein engineering yet poses unsolved challenges. Computational costs associated with physics-based models, and the limited amount of data available to support data-driven models, have left stability prediction behind the prediction of structure. New data and advancements in modeling approaches now afford greater opportunities to solve this challenge. We evaluate a set of data-driven prediction models using a large, newly published dataset of various synthetic proteins and their experimental stability data. We test the models in two separate tasks, exercising extrapolation to new protein classes and prediction of the effects on stability of small mutations. Small convolutional neural networks trained from scratch on stability data and large protein embedding models passed through simple downstream models trained on stability data are both able to predict stability comparably well. The largest of the embedding models yields the best performance in all tasks and metrics. We also explored the marginal performance gains seen with two ensemble models.
What problem does this paper attempt to address?