Clinical application of Complete Long Read genome sequencing identifies a 16kb intragenic duplication in EHMT1 in a patient with suspected Kleefstra syndrome

John E Gorzynski,Shruti Marwaha,Chloe M Reuter,Tanner Jensen,Alexis Ferrasse,Archana Raja,Liliana Fernandez,Elijah Kravets,Jennefer Carter,Devon Bonner,Shirley Sutton,Undiagnosed Diseases Network (UDN),Maura Ruzhnikov,Louanna Hudgins,Paul G Fisher,Jonathan Bernstein,Matthew T Wheeler,Euan A Ashley
DOI: https://doi.org/10.1101/2024.03.28.24304304
2024-06-05
Abstract:Long read sequencing offers benefits for the detection of structural variation in Mendelian disease. Here, we applied a new technology that generates contiguous long reads via tagmentation and sequencing by synthesis to a small cohort of patients with undiagnosed disease from the Undiagnosed Diseases Network. We first compare sequencing from the HG002 benchmark sample from Genome In A Bottle using nanopore sequencing (R10.4.1, duplex reads, Oxford Nanopore), single molecule real time sequencing (Revio SMRT cell, Pacific Biosciences) and complete long read sequencing (S4 flowcell, Novaseq, Illumina). Coverage was 33-35x across platforms. Read length N50 was 6.5kb (ICLR), 16.9kb (SMRT), and 33.8kb (ONT). We noted small differences in single nucleotide variant F1 scores across long read technologies with single nucleotide variant F1 scores (0.985-0.999) exceeding indel scores (0.78-0.99) and structural variant scores (0.74-0.96). We applied CLR sequencing to seven undiagnosed patients. In one patient, we detected and prioritized a novel 16kb intragenic duplication encompassing exons 5 and 6 in EHMT1. Resolution of the breakpoints and examination of flanking sequences revealed that the duplication was present in tandem and was predicted to result in a frameshift of the amino acid sequence and an early termination codon. It resulted in a diagnosis of Kleefstra syndrome. The variant was confirmed with targeted EHMT1 clinical testing and detected via nanopore and SMRT sequencing. In summary, we report the early clinical application of complete long read sequencing to a small cohort of undiagnosed patients.
What problem does this paper attempt to address?