Physical separation of haplotypes in dikaryons allows benchmarking of phasing accuracy in Nanopore and HiFi assemblies with Hi-C data

Hongyu Duan,Ashley W. Jones,Tim Hewitt,Amy Mackenzie,Yiheng Hu,Anna Sharp,David Lewis,Rohit Mago,Narayana M. Upadhyaya,John P. Rathjen,Eric A. Stone,Benjamin Schwessinger,Melania Figueroa,Peter N. Dodds,Sambasivam Periyannan,Jana Sperschneider
DOI: https://doi.org/10.1186/s13059-022-02658-2
IF: 17.906
2022-03-25
Genome Biology
Abstract:Abstract Background Most animals and plants have more than one set of chromosomes and package these haplotypes into a single nucleus within each cell. In contrast, many fungal species carry multiple haploid nuclei per cell. Rust fungi are such species with two nuclei (karyons) that contain a full set of haploid chromosomes each. The physical separation of haplotypes in dikaryons means that, unlike in diploids, Hi-C chromatin contacts between haplotypes are false-positive signals. Results We generate the first chromosome-scale, fully-phased assembly for the dikaryotic leaf rust fungus Puccinia triticina and compare Nanopore MinION and PacBio HiFi sequence-based assemblies. We show that false-positive Hi-C contacts between haplotypes are predominantly caused by phase switches rather than by collapsed regions or Hi-C read mis-mappings. We introduce a method for phasing of dikaryotic genomes into the two haplotypes using Hi-C contact graphs, including a phase switch correction step. In the HiFi assembly, relatively few phase switches occur, and these are predominantly located at haplotig boundaries and can be readily corrected. In contrast, phase switches are widespread throughout the Nanopore assembly. We show that haploid genome read coverage of 30–40 times using HiFi sequencing is required for phasing of the leaf rust genome, with 0.7% heterozygosity, and that HiFi sequencing resolves genomic regions with low heterozygosity that are otherwise collapsed in the Nanopore assembly. Conclusions This first Hi-C based phasing pipeline for dikaryons and comparison of long-read sequencing technologies will inform future genome assembly and haplotype phasing projects in other non-haploid organisms.
genetics & heredity,biotechnology & applied microbiology
What problem does this paper attempt to address?