The publicly-accessible RNA barcode segments based on the genetic tests of complete genome sequences for SARS-CoV-2 identification from HCoVs and SARSr-CoV-2 lineages
Changqiao You,Shuai Jiang,Yunyun Ding,Shunxing Ye,Xiaoxiao Zou,Hongming Zhang,Zeqi Li,Fenglin Chen,Yongliang Li,Xingyi Ge,Xinhong Guo
DOI: https://doi.org/10.1016/j.virs.2024.01.006
IF: 6.947
2024-01-21
Virologica Sinica
Abstract:Highlights • Using SNP sites analysis, highly conserved barcode segments of SARS-CoV-2 were preliminarily extracted from public databases. • All main and subordinate barcode segments achieved a perfect identification precision rate of 100% for SARS-CoV-2. • The species-specific barcode segments of SARS-CoV-2 were mainly distributed in ORF1ab , S , E , ORF7a , and N coding sequences. • http://virusbarcodedatabase.top/ offered visitors access to SARS-CoV-2 barcode segments, tools for segment creation. • Barcode segments could effectually and stably identify SARS-CoV-2 from human coronaviruses and SARS-CoV-2 related lineages. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the pathogen responsible for coronavirus disease 2019 (COVID-19), continues to evolve, giving rise to more variants and global reinfections. Previous research has demonstrated that barcode segments can effectively and cost-efficiently identify specific species within closely related populations. In this study, we designed and tested RNA barcode segments based on genetic evolutionary relationships to facilitate the efficient and accurate identification of SARS-CoV-2 from extensive virus samples, including human coronaviruses (HCoVs) and SARSr-CoV-2 lineages. Nucleotide sequences sourced from NCBI and GISAID were meticulously selected and curated to construct training sets, encompassing 1,733 complete genome sequences of HCoVs and SARSr-CoV-2 lineages. Through genetic-level species testing, we validated the accuracy and reliability of the barcode segments for identifying SARS-CoV-2. Subsequently, 75 main and subordinate species-specific barcode segments for SARS-CoV-2, located in ORF1ab , S , E , ORF7a , and N coding sequences, were intercepted and screened based on single-nucleotide polymorphism sites and weighted scores. Post-testing, these segments exhibited high recall rates (nearly 100%), specificity (almost 30% at the nucleotide level), and precision (100%) performance on identification. They were eventually visualized using one and two-dimensional combined barcodes and deposited in an online database ( http://virusbarcodedatabase.top/ ). The successful integration of barcoding technology in SARS-CoV-2 identification provides valuable insights for future studies involving complete genome sequence polymorphism analysis. Moreover, this cost-effective and efficient identification approach also provides valuable reference for future research endeavors related to virus surveillance.
virology