Abstract:Background: Assembly algorithm choice should be a deliberate, well-justified decision when researchers create genome assemblies for eukaryotic organisms from third-generation sequencing technologies. While third-generation sequencing by Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) has overcome the disadvantages of short read lengths specific to next-generation sequencing (NGS), third-generation sequencers are known to produce more error-prone reads, thereby generating a new set of challenges for assembly algorithms and pipelines. However, the introduction of HiFi reads, which offer substantially reduced error rates, has provided a promising solution for more accurate assembly outcomes. Since the introduction of third-generation sequencing technologies, many tools have been developed that aim to take advantage of the longer reads, and researchers need to choose the correct assembler for their projects. Results: We benchmarked state-of-the-art long-read de novo assemblers to help readers make a balanced choice for the assembly of eukaryotes. To this end, we used 12 real and 64 simulated datasets from different eukaryotic genomes, with different read length distributions, imitating PacBio continuous long-read (CLR), PacBio high-fidelity (HiFi), and ONT sequencing to evaluate the assemblers. We include 5 commonly used long-read assemblers in our benchmark: Canu, Flye, Miniasm, Raven, and wtdbg2 for ONT and PacBio CLR reads. For PacBio HiFi reads , we include 5 state-of-the-art HiFi assemblers: HiCanu, Flye, Hifiasm, LJA, and MBG. Evaluation categories address the following metrics: reference-based metrics, assembly statistics, misassembly count, BUSCO completeness, runtime, and RAM usage. Additionally, we investigated the effect of increased read length on the quality of the assemblies and report that read length can, but does not always, positively impact assembly quality. Conclusions: Our benchmark concludes that there is no assembler that performs the best in all the evaluation categories. However, our results show that overall Flye is the best-performing assembler for PacBio CLR and ONT reads, both on real and simulated data. Meanwhile, best-performing PacBio HiFi assemblers are Hifiasm and LJA. Next, the benchmarking using longer reads shows that the increased read length improves assembly quality, but the extent to which that can be achieved depends on the size and complexity of the reference genome.

Genome sequence assembly evaluation using long-range sequencing data

A Novel High-Accuracy Genome Assembly Method Utilizing a High-Throughput Workflow

Comparison of the Two Up-to-date Sequencing Technologies for Genome Assembly: HiFi Reads of Pacific Biosciences Sequel II System and Ultralong Reads of Oxford Nanopore

Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species

DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies

MECAT: an ultra-fast mapping, error correction and<i>de novo</i>assembly tool for single-molecule sequencing reads

Improving and Correcting the Contiguity of Long-Read Genome Assemblies of Three Plant Species Using Optical Mapping and Chromosome Conformation Capture Data

Benchmarking multi-platform sequencing technologies for human genome assembly

Assessment of Metagenomic Assemblers Based on Hybrid Reads of Real and Simulated Metagenomic Sequences

MECAT: an ultra-fast mapping, error correction and de novo assembly tool for single-molecule sequencing reads

A pilot study for channel catfish whole genome sequencing and de novo assembly

Evaluating long-read de novo assembly tools for eukaryotic genomes: insights and considerations

Scalable long read self-correction and assembly polishing with multiple sequence alignment

BAUM: A DNA Assembler by Adaptive Unique Mapping and Local Overlap-Layout-Consensus

Benchmarking of next and third generation sequencing technologies and their associated algorithms for de novo genome assembly

Benchmarking de novo assembly methods on metagenomic sequencing data

Versatile genome assembly evaluation with QUAST-LG

Chrom-pro: A User-Friendly Toolkit for De-novo Chromosome Assembly and Genomic Analysis

Accelerating De Bruijn Graph-Based Genome Assembly for High-Throughput Short Read Data

misFinder: identify mis-assemblies in an unbiased manner using reference and paired-end reads

MAECI: A Pipeline for Generating Consensus Sequence with Nanopore Sequencing Long-read Assembly and Error Correction