Technical report on best practices for hybrid and long read de novo assembly of bacterial genomes utilizing Illumina and Oxford Nanopore Technologies reads

Kay Nieselt,Simon Tim Hackl,Theresa Anisja Harbig
DOI: https://doi.org/10.1101/2022.10.25.513682
2022-10-27
bioRxiv
Abstract:The emergence of commercial long read sequencing technologies in the 2010s and the concomitant development of new bioinformatics tools bears the potential of de novo genome assemblies of unprecedented contiguity and quality. However, until today these novel technologies suffer from high rates of sequencing errors. These may be overcome by using long and short reads in combination, in so called hybrid approaches, or by increasing the throughput and thereby the coverage of sequencing runs. In particular the latter will thereby increase the cost of the assembly inevitably. Herein, to-date long read and hybrid assemblers were tested on real whole genome sequencing Illumina and Oxford Nanopore Technologies read data sets and sub samples of these in order to elaborate a best practice for de novo assembly. The findings suggest that although long reads alone can be used to reconstruct complete and contiguous genomes, in particular the single-nucleotide and indel error rate remains high compared to hybrid approaches and that this can impact downstream applications such as variation discovery and gene prediction negatively.
What problem does this paper attempt to address?