A Protein Language Model for Exploring Viral Fitness Landscapes

Jumpei Ito,Adam Strange,Wei Liu,Gustav Joas,Spyros Lytras,The Genotype to Phenotype Japan (G2P-Japan) Consortium,Kei Sato
DOI: https://doi.org/10.1101/2024.03.15.584819
2024-03-18
Abstract:Successively emerging SARS-CoV-2 variants lead to repeated epidemic surges through escalated spreading potential (i.e., fitness). Modeling genotype–fitness relationship enables us to pinpoint the mutations boosting viral fitness and flag high-risk variants immediately after their detection. Here, we introduce CoVFit, a protein language model able to predict the fitness of variants based solely on their spike protein sequences. CoVFit was trained with genotype–fitness data derived from viral genome surveillance and functional mutation data related to immune evasion. When limited to only data available before the emergence of XBB, CoVFit successfully predicted the higher fitness of the XBB lineage. Fully-trained CoVFit identified 549 fitness elevation events throughout SARS-CoV-2 evolution until late 2023. Furthermore, a CoVFit-based simulation was able to predict the higher fitness of JN.1 subvariants before their detection. Our study provides both insight into the SARS-CoV-2 fitness landscape and a novel tool potentially transforming viral genome surveillance.
Microbiology
What problem does this paper attempt to address?