Abstract:Genome assembly is a computational technique that involves piecing together deoxyribonucleic acid (DNA) fragments generated by sequencing technologies to create a comprehensive and precise representation of the entire genome. Generating a high-quality human reference genome is a crucial prerequisite for comprehending human biology, and it is also vital for downstream genomic variation analysis. Many efforts have been made over the past few decades to create a complete and gapless reference genome for humans by using a diverse range of advanced sequencing technologies. Several available tools are aimed at enhancing the quality of haploid and diploid human genome assemblies, which include contig assembly, polishing of contig errors, scaffolding and variant phasing. Selecting the appropriate tools and technologies remains a daunting task despite several studies have investigated the pros and cons of different assembly strategies. The goal of this paper was to benchmark various strategies for human genome assembly by combining sequencing technologies and tools on two publicly available samples (NA12878 and NA24385) from Genome in a Bottle. We then compared their performances in terms of continuity, accuracy, completeness, variant calling and phasing. We observed that PacBio HiFi long-reads are the optimal choice for generating an assembly with low base errors. On the other hand, we were able to produce the most continuous contigs with Oxford Nanopore long-reads, but they may require further polishing to improve on quality. We recommend using short-reads rather than long-reads themselves to improve the base accuracy of contigs from Oxford Nanopore long-reads. Hi-C is the best choice for chromosome-level scaffolding because it can capture the longest-range DNA connectedness compared to 10× linked-reads and Bionano optical maps. However, a combination of multiple technologies can be used to further improve the quality and completeness of genome assembly. For diploid assembly, hifiasm is the best tool for human diploid genome assembly using PacBio HiFi and Hi-C data. Looking to the future, we expect that further advancements in human diploid assemblers will leverage the power of PacBio HiFi reads and other technologies with long-range DNA connectedness to enable the generation of high-quality, chromosome-level and haplotype-resolved human genome assemblies.

User-friendly genome assembly and gene annotation pipelines for vertebrates

Building better genome annotations across the tree of life

CSA: A High-Throughput Chromosome-Scale Assembly Pipeline for Vertebrate Genomes.

Pipeasm: a tool for automated large chromosome-scale genome assembly and evaluation

Twelve quick steps for genome assembly and annotation in the classroom

Towards complete and error-free genome assemblies of all vertebrate species

Comparison of the Two Up-to-date Sequencing Technologies for Genome Assembly: HiFi Reads of Pacific Biosciences Sequel II System and Ultralong Reads of Oxford Nanopore

A Scalable and Accurate Targeted Gene Assembly Tool (SAT-Assembler) for Next-Generation Sequencing Data

Whole Animal Genome Sequencing: user-friendly, rapid, containerized pipelines for processing, variant discovery, and annotation of short-read whole genome sequencing data

The changing face of genome assemblies: Guidance on achieving high‐quality reference genomes

P_RNA_scaffolder: a Fast and Accurate Genome Scaffolder Using Paired-End RNA-sequencing Reads

A Pipeline for Completing Bacterial Genomes Using in Silicoand Wet Lab Approaches

UnigeneFinder: An automated pipeline for gene calling from transcriptome assemblies without a reference genome

ZGA: a flexible pipeline for read processing, de novo assembly and annotation of prokaryotic genomes

Combining independent de novo assemblies optimizes the coding transcriptome for nonconventional model eukaryotic organisms

Benchmarking multi-platform sequencing technologies for human genome assembly

Chrom-pro: A User-Friendly Toolkit for De-novo Chromosome Assembly and Genomic Analysis

Complement Genome Annotation Lift over Using a Weighted Sequence Alignment Strategy.

The Utilization of Reference-Guided Assembly and In Silico Libraries Improves the Draft Genome of Clarias batrachus and Culter alburnus

Optimization of the In-Silico Mate-Pair Method Improved Contiguity and Accuracy of Genome Assembly

Single-molecule sequencing and conformational capture enable de novo mammalian reference genomes