High rate of SARS-CoV2 nonsense spike genomes coding for prematurely truncated proteins

Alessio D'Alessandro
DOI: https://doi.org/10.48550/arXiv.2105.10074
2021-06-07
Abstract:Replication of SARS-CoV2 virions is an error-prone process which may eventually generate a percentage of impaired protein copies with complete lack of functionality. For instance, after RNA mis-replication, a very premature stop codon (UAG, UAA, UGA) coding for a prematurely truncated (nonsense-mutated) spike protein may occur. In the natural virus replication process via cell infection, the nonsense genomes are corrected by the proofreading enzymes of the virus, strongly penalized by natural selection and condemned to a very short life by the host cell's mRNA watching mechanisms. However, for the very long spike genome of 1273 codons, a truncated non-functional spike protein may potentially still occur with a high frequency, even in presence of a low mutation rate per single nucleotide. With this paper, a hi-fidelity post-processing of SARS-CoV2 spike sequences is provided: in ex-vivo samples from patients, an impressively high rate of 26\% of prematurely-stopped (nonsense-mutated) spike genomes sequences due to insertions/deletions is found, compared with a 9.7\% obtained from in-vitro cell culture. A general warning on the possible high rate of prematurely-stopped spike protein sequences is also raised for "artificial" de novo DNA synthesis processes of SARS-CoV2 spike genomes with no associated natural proofreading/selection, possibly including vaccine preparations. Finally, a metric based on the ratio between prematurely stopped and "normal" genomes is proposed as a potential host-independent variant-watching tool, able to classify the infectivity of new spike mutations.
Genomics
What problem does this paper attempt to address?