CovTransformer: A transformer model for SARS-CoV-2 lineage frequency forecasting

Yinan Feng,Emma E. Goldberg,Michael Kupperman,Xitong Zhang,Youzuo Lin,Ruian Ke
DOI: https://doi.org/10.1101/2024.04.01.24305089
2024-04-01
Abstract:With hundreds of SARS-CoV-2 lineages circulating in the global population, there is an urgent need for forecasting lineage frequencies and thus identifying rapidly expanding lineages. To address this need, we constructed a framework for SARS-CoV-2 lineage frequency forecasting (CovTransformer), based on the transformer architecture. We designed our framework to navigate challenges such as a limited amount of data with high levels of noise and bias. We first trained and tested the model using data from the UK and the US, and then tested the generalization ability of the model on data collected across the globe. Remarkably, the model makes predictions two months into the future with high levels of accuracy in 31 countries. Finally, we show that our model performed substantially better than the current gold-standard, i.e. a regression-based model implemented in Nextstrain. Overall, our work demonstrates transformer models represent a promising approach for lineage forecasting and pandemic monitoring.
Epidemiology
What problem does this paper attempt to address?