Matching Excellence: ONT’s Rise to Parity with PacBio in Genome Reconstruction of Non-Model Bacterium with High GC Content

Axel Soto-Serrano,Wenwen Li,Farhad M. Panah,Yan Hui,Pablo Atienza,Alexey Fomenkov,Richard J. Roberts,Paulina Deptula,Lukasz Krych
DOI: https://doi.org/10.1101/2024.02.26.582104
2024-02-27
Abstract:Reconstruction of complete bacterial genomes is a vital aspect of microbial research, as it provides complex information about genetic content, gene ontology, and regulation. It has become a domain of 3rd generation, long-read sequencing platforms, as short-read technologies can deliver mainly fragmented genomes. PacBio platform can provide high-quality complete genomes, yet remains one of the most expensive sequencing strategies. Oxford Nanopore Technology (ONT) offers the advantage of producing the longest reads, being at the same time the most cost-effective option in terms of platform costs, as well as library preparation, and sequencing. However, ONTs error rate, although significantly reduced lately, still holds a certain level of distrust in the scientific community. In recent years, hybrid assembly of Nanopore and Illumina data has been used to solve ONTs issue with error rate and has yielded the best results in terms of genome completeness, quality, and price. However, the latest advancements in Nanopore technology, including new flow cells (R10.4.1), new library preparation chemistry (V14) and duplex-mode, updated basecallers (Dorado v0.4.1), and the realization that sequencing in dark mode results in significantly increased throughput, have had a significant impact on the quality of generated data and, thus, the recovery of complete genomes by ONT sequencing alone. In this study, we compared the data generated by ONT using three sequencing strategies (Native barcoding, RAPID barcoding, and custom-developed: BARSEQ) against PacBio and Illumina (NextSeq) as well as Illumina-ONT hybrid data. For this purpose, we employed three strains of the actinobacteria , whose genomes have been proven difficult to reconstruct due to high GC content, regions of repeated sequences and massive genome rearrangements. Our data indicate that DNA libraries prepared with the native barcoding kit, sequenced with V14 chemistry on R10.4.1 flow cell, and assembled with Flye resulted in the reconstruction of complete genomes of overall quality highly similar to that of genomes reconstructed with PacBio. The highest level of quality can be achieved by hybrid assembly of data from the Native barcoding kit complemented with data from custom-developed BARSEQ, both sequenced on R10.4.1 flow cell. In conclusion, our results demonstrate that ONT can be used as a cost-effective sequencing strategy, without the need for complementing with other sequencing technologies, for the reconstruction of complete genomes of the highest quality.
Genomics
What problem does this paper attempt to address?