Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Shawn Reeves,Subha Kalyaanamoorthy
DOI: https://doi.org/10.1038/s42256-024-00887-7
IF: 23.8
2024-09-21
Nature Machine Intelligence
Abstract:Protein sequence likelihood models (PSLMs) are an emerging class of self-supervised deep learning algorithms that learn probability distributions over amino acid identities conditioned on structural or evolutionary context. Recently, PSLMs have demonstrated impressive performance in predicting the relative fitness of variant sequences without any task-specific training, but their potential to address a central goal of protein engineering—enhancing stability—remains underexplored. Here we comprehensively analyse the capacity for zero-shot transfer of eight PSLMs towards prediction of relative thermostability for variants of hundreds of heterogeneous proteins across several quantitative datasets. PSLMs are compared with popular task-specific stability models, and we show that some PSLMs have competitive performance when the appropriate statistics are considered. We highlight relative strengths and weaknesses of PSLMs and examine their complementarity with task-specific models, specifically focusing our analyses on stability-engineering applications. Our results indicate that all PSLMs can appreciably augment the predictions of existing methods by integrating insights from their disparate training objectives, suggesting a path forward in the stagnating field of computational stability prediction.
computer science, artificial intelligence, interdisciplinary applications
What problem does this paper attempt to address?