Abstract:Genome assembly is a computational technique that involves piecing together deoxyribonucleic acid (DNA) fragments generated by sequencing technologies to create a comprehensive and precise representation of the entire genome. Generating a high-quality human reference genome is a crucial prerequisite for comprehending human biology, and it is also vital for downstream genomic variation analysis. Many efforts have been made over the past few decades to create a complete and gapless reference genome for humans by using a diverse range of advanced sequencing technologies. Several available tools are aimed at enhancing the quality of haploid and diploid human genome assemblies, which include contig assembly, polishing of contig errors, scaffolding and variant phasing. Selecting the appropriate tools and technologies remains a daunting task despite several studies have investigated the pros and cons of different assembly strategies. The goal of this paper was to benchmark various strategies for human genome assembly by combining sequencing technologies and tools on two publicly available samples (NA12878 and NA24385) from Genome in a Bottle. We then compared their performances in terms of continuity, accuracy, completeness, variant calling and phasing. We observed that PacBio HiFi long-reads are the optimal choice for generating an assembly with low base errors. On the other hand, we were able to produce the most continuous contigs with Oxford Nanopore long-reads, but they may require further polishing to improve on quality. We recommend using short-reads rather than long-reads themselves to improve the base accuracy of contigs from Oxford Nanopore long-reads. Hi-C is the best choice for chromosome-level scaffolding because it can capture the longest-range DNA connectedness compared to 10× linked-reads and Bionano optical maps. However, a combination of multiple technologies can be used to further improve the quality and completeness of genome assembly. For diploid assembly, hifiasm is the best tool for human diploid genome assembly using PacBio HiFi and Hi-C data. Looking to the future, we expect that further advancements in human diploid assemblers will leverage the power of PacBio HiFi reads and other technologies with long-range DNA connectedness to enable the generation of high-quality, chromosome-level and haplotype-resolved human genome assemblies.

Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads

Chromosome-scale, haplotype-resolved assembly of human genomes

Gapless assembly of complete human and plant chromosomes using only nanopore sequencing

De Novoassembly of Human Genome at Single-Cell Levels

Phasing Diploid Genome Assembly Graphs with Single-Cell Strand Sequencing

Semi-automated assembly of high-quality diploid human reference genomes

Graphasing: phasing diploid genome assembly graphs with single-cell strand sequencing

Haplotype-resolved assembly of diploid genomes without parental data

Nanopore sequencing and assembly of a human genome with ultra-long reads

Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes

Simultaneous de novo calling and phasing of genetic variants at chromosome-scale using NanoStrand-seq

Robust haplotype-resolved assembly of diploid individuals without parental data

Phasing or purging: tackling the genome assembly of a highly heterozygous animal species in the era of high-accuracy long reads

Phased nanopore assembly with Shasta and modular graph phasing with GFAse

Accurate haplotype-resolved assembly reveals the origin of structural variants for human trios

Benchmarking of bioinformatics tools for the hybrid de novo assembly of human whole-genome sequencing data

Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation

Chromosome-level and haplotype-resolved genome assembly enabled by high-throughput single-cell sequencing of gamete genomes

Benchmarking multi-platform sequencing technologies for human genome assembly

Epco-45. Complete Chromosome-Scale Diploid-Phased Assembly Of A Normal Human Astrocyte Cell Line

A draft human pangenome reference