AccuVIR: an ACCUrate VIRal genome assembly tool for third-generation sequencing data

Runzhou Yu,Dehan Cai,Yanni Sun
DOI: https://doi.org/10.1093/bioinformatics/btac827
IF: 5.8
2022-12-26
Bioinformatics
Abstract:Abstract Motivation RNA viruses tend to mutate constantly. While many of the variants are neutral, some can lead to higher transmissibility or virulence. Accurate assembly of complete viral genomes enables the identification of underlying variants, which are essential for studying virus evolution and elucidating the relationship between genotypes and virus properties. Recently, third-generation sequencing platforms such as Nanopore sequencers have been used for real-time virus sequencing for Ebola, Zika, coronavirus disease 2019, etc. However, their high per-base error rate prevents the accurate reconstruction of the viral genome. Results In this work, we introduce a new tool, AccuVIR, for viral genome assembly and polishing using error-prone long reads. It can better distinguish sequencing errors from true variants based on the key observation that sequencing errors can disrupt the gene structures of viruses, which usually have a high density of coding regions. Our experimental results on both simulated and real third-generation sequencing data demonstrated its superior performance on generating more accurate viral genomes than generic assembly or polish tools. Availability and implementation The source code and the documentation of AccuVIR are available at https://github.com/rainyrubyzhou/AccuVIR. Supplementary information Supplementary data are available at Bioinformatics online.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
What problem does this paper attempt to address?