Optimal entropic properties of SARS-CoV-2 RNA sequences

Marco Formentin,Roberto Chignola,Marco Favretti
DOI: https://doi.org/10.1098/rsos.231369
IF: 3.5
2024-01-01
Royal Society Open Science
Abstract:The reaction of the scientific community against the COVID-19 pandemic has generated a huge (approx. 10 6 entries) dataset of genome sequences collected worldwide and spanning a relatively short time window. These unprecedented conditions together with the certain identification of the reference viral genome sequence allow for an original statistical study of mutations in the virus genome. In this paper, we compute the Shannon entropy of every sequence in the dataset as well as the relative entropy and the mutual information between the reference sequence and the mutated ones. These functions, originally developed in information theory, measure the information content of a sequence and allows us to study the random character of mutation mechanism in terms of its entropy and information gain or loss. We show that this approach allows us to set in new format known features of the SARS-CoV-2 mutation mechanism like the CT bias, but also to discover new optimal entropic properties of the mutation process in the sense that the virus mutation mechanism track closely theoretically computable lower bounds for the entropy decrease and the information transfer.
multidisciplinary sciences
What problem does this paper attempt to address?