A critical reexamination of recovered SARS-CoV-2 sequencing data

Florence Débarre,Zach Hensel
DOI: https://doi.org/10.1101/2024.02.15.580500
2024-08-22
Abstract:SARS-CoV-2 genomes collected at the onset of the Covid-19 pandemic are valuable because they could help understand how the virus entered the human population. In 2021, Jesse Bloom reported on the recovery of a dataset of raw sequencing reads that had been removed from the NCBI SRA database at the request of the data generators, a scientific team at Wuhan Univer- sity (Wang et al., 2020b). Bloom concluded that the data deletion had obfuscated the origin of SARS-CoV-2 and suggested that deletion may have been requested to comply with a govern- ment order; further, he questioned reported sample collection dates on and after January 30, 2020. Here, we show that sample collection dates were published in 2020 by Wang et al. to- gether with the sequencing reads, and match the dates given by the authors in 2021. Collection dates of January 30, 2020 were manually removed by Bloom during his analysis of the data. We examine mutations in these sequences and confirm that they are entirely consistent with the previously known genetic diversity of SARS-CoV-2 of late January 2020. Finally, we explain how an apparent phylogenetic rooting paradox described by Bloom was resolved by subsequent analysis. Our reanalysis demonstrates that there was no basis to question the sample collec- tion dates published by Wang et al..
Evolutionary Biology
What problem does this paper attempt to address?