Worldwide tracing of mutations and the evolutionary dynamics of SARS CoV 2
Zhong-Yin Zhou,Hang Liu,Yue-Dong Zhang,Yin-Qiao Wu,Min-Sheng Peng,Aimin Li,David M. Irwin,Haipeng Li,Jian Lu,Yiming Bao,Xuemei Lu,Di Liu,Ya-Ping Zhang
DOI: https://doi.org/10.1101/2020.08.07.242263
2020-01-01
bioRxiv
Abstract:Understanding the mutational and evolutionary dynamics of SARS-CoV-2 is essential for treating COVID-19 and the development of a vaccine. Here, we analyzed publicly available 15,818 assembled SARS-CoV-2 genome sequences, along with 2,350 raw sequence datasets sampled worldwide. We investigated the distribution of inter-host single nucleotide polymorphisms (inter-host SNPs) and intra-host single nucleotide variations (iSNVs). Mutations have been observed at 35.6% (10,649/29,903) of the bases in the genome. The substitution rate in some protein coding regions is higher than the average in SARS-CoV-2 viruses, and the high substitution rate in some regions might be driven to escape immune recognition by diversifying selection. Both recurrent mutations and human-to-human transmission are mechanisms that generate fitness advantageous mutations. Furthermore, the frequency of three mutations (S protein, F400L; ORF3a protein, T164I; and ORF1a protein, Q6383H) has gradual increased over time on lineages, which provides new clues for the early detection of fitness advantageous mutations. Our study provides theoretical support for vaccine development and the optimization of treatment for COVID-19. We call researchers to submit raw sequence data to public databases.