Advancing long-read nanopore genome assembly and accurate variant calling for rare disease detection

Shloka Negi,Sarah L Stenton,Seth Berger,Brandy McNulty,Ivo Violich,Joshua Gardner,Todd Hillaker,Sara M O'Rourke,Melanie C O'Leary,Elizabeth Carbonell,Christina Austin-Tse,Gabrielle Lemire,Jillian Serrano,Brian Mangilog,Grace VanNoy,MIkhail Kolmogorov,Eric Vilain,Anne O'Donnell-Luria,Emmanuele Delot,Karen H Miga,Jean Monlong,Benedict Paten
DOI: https://doi.org/10.1101/2024.08.22.24312327
2024-08-22
Abstract:More than 50% of families with suspected rare monogenic diseases remain unsolved after whole genome analysis by short read sequencing (SRS). Long-read sequencing (LRS) could help bridge this diagnostic gap by capturing variants inaccessible to SRS, facilitating long-range mapping and phasing, and providing haplotype-resolved methylation profiling. To evaluate LRS's additional diagnostic yield, we sequenced a rare disease cohort of 98 samples, including 41 probands and some family members, using nanopore sequencing, achieving per sample ~36x average coverage and 32 kilobase (kb) read N50 from a single flow cell. Our Napu pipeline generated assemblies, phased variants, and methylation calls. LRS covered, on average, coding exons in ~280 genes and ~5 known Mendelian disease genes that were not covered by SRS. In comparison to SRS, LRS detected additional rare, functionally annotated variants, including SVs and tandem repeats, and completely phased 87% of protein-coding genes. LRS detected additional de novo variants, and could be used to distinguish postzygotic mosaic variants from prezygotic de novos. Eleven probands were solved, with diverse underlying genetic causes including de novo and compound heterozygous variants, large-scale SVs, and epigenetic modifications. Our study demonstrates LRS's potential to enhance diagnostic yield for rare monogenic diseases, implying utility in future clinical genomics workflows.
Genetic and Genomic Medicine
What problem does this paper attempt to address?