Simultaneous de novo calling and phasing of genetic variants at chromosome-scale using NanoStrand-seq

Xiuzhen Bai,Zonggui Chen,Kexuan Chen,Zixin Wu,Rui Wang,Jun'e Liu,Liang Chang,Lu Wen,Fuchou Tang
DOI: https://doi.org/10.1038/s41421-024-00694-9
2024-07-09
Abstract:The successful accomplishment of the first telomere-to-telomere human genome assembly, T2T-CHM13, marked a milestone in achieving completeness of the human reference genome. The upcoming era of genome study will focus on fully phased diploid genome assembly, with an emphasis on genetic differences between individual haplotypes. Most existing sequencing approaches only achieved localized haplotype phasing and relied on additional pedigree information for further whole-chromosome scale phasing. The short-read-based Strand-seq method is able to directly phase single nucleotide polymorphisms (SNPs) at whole-chromosome scale but falls short when it comes to phasing structural variations (SVs). To shed light on this issue, we developed a Nanopore sequencing platform-based Strand-seq approach, which we named NanoStrand-seq. This method allowed for de novo SNP calling with high precision (99.52%) and acheived a superior phasing accuracy (0.02% Hamming error rate) at whole-chromosome scale, a level of performance comparable to Strand-seq for haplotype phasing of the GM12878 genome. Importantly, we demonstrated that NanoStrand-seq can efficiently resolve the MHC locus, a highly polymorphic genomic region. Moreover, NanoStrand-seq enabled independent direct calling and phasing of deletions and insertions at whole-chromosome level; when applied to long genomic regions of SNP homozygosity, it outperformed the strategy that combined Strand-seq with bulk long-read sequencing. Finally, we showed that, like Strand-seq, NanoStrand-seq was also applicable to primary cultured cells. Together, here we provided a novel methodology that enabled interrogation of a full spectrum of haplotype-resolved SNPs and SVs at whole-chromosome scale, with broad applications for species with diploid or even potentially polypoid genomes.
What problem does this paper attempt to address?