Machine Learning Reveals Key Glycoprotein Mutations and Rapidly Assigns Lassa Virus Lineages

Richard Olumide Daodu,Jens-Uwe Ulrich,Denise Kuehnert
DOI: https://doi.org/10.1101/2024.07.31.605963
2024-11-05
Abstract:Lassa fever, caused by the Lassa virus (LASV), remains a major public health concern in West Africa, causing numerous fatalities annually and several intercontinental cases since its discovery in 1969. Despite ongoing research, no approved vaccines are available, with current efforts focusing on immunotherapy. LASV is divided into distinct lineages that circulate in specific geographic regions, elicit varying immune responses, and exhibit different pathophysiological effects. Understanding the genetic differences between these lineages is crucial for developing, improving, and distributing diagnostics, treatments, and vaccines. In this study, we analyzed the LASV glycoprotein complex (GPC), the only surface protein, using statistics, machine learning, and phylogenetics. At a population scale, we identified key amino acid differences between Nigerian lineages and those in other West African regions, particularly near the stable signal peptide cleavage site and other immune-related regions (e.g., AA positions 59-76). Additionally, we found that GPC sequences from Lineages II and III are shorter than those from Lineage IV, due to the high prevalence of a codon insertion at positions 178-180 (amino acid position 60). This insertion may contribute to inaccuracies observed in molecular diagnostics for LASV and may also play a role in the increased fatality associated with Lineage IV. The insertion has reemerged and persisted in Lineage II which may indicate a fitness advantage. Furthermore, we developed a fast and highly accurate lineage classification tool called CLASV that allows rapid identification of LASV lineages, improving the ability to monitor emerging outbreaks and exported cases.
Bioinformatics
What problem does this paper attempt to address?