HiFi long-read genomes for difficult-to-detect clinically relevant variants
Wolfram Hoeps,Marjan M. Weiss,Ronny Derks,Jordi Corominas Galbany,Amber den Ouden,Simone van den Heuvel,Raoul Timmermans,Jos Smits,Tom Mokveld,Egor Dolzhenko,Xiao Chen,Arthur van den Wijngaard,Michael A. Eberle,Helger G. Yntema,Alexander Hoischen,Christian Gilissen,Lisenka E.L.M. Vissers
DOI: https://doi.org/10.1101/2024.09.17.24313798
2024-09-19
Abstract:Clinical short-read exome and genome sequencing approaches have positively impacted diagnostic testing for rare diseases. Yet, technical limitations associated with short reads challenge their use for detection of disease-associated variation in complex regions of the genome. Long-read sequencing (LRS) technologies may overcome these challenges, potentially qualifying as a first-tier test for all rare diseases. To test this hypothesis, we performed LRS (30x HiFi genomes) for 100 samples with 145 known clinically relevant germline variants that are challenging to detect using short-read sequencing and necessitate a broad range of complementary test modalities in diagnostic laboratories.
We show that relevant variant callers readily re-identify the majority of variants (120/145, 83%), including ~90% of structural variants, SNVs/InDels in homologous sequences and expansions of short tandem repeats. Another 10% (n=14) was visually apparent in the data but not automatically detected. Our analyses also identified systematic challenges for the remaining 7% (n=11) of variants such as the detection of AG-rich repeat expansions. Titration analysis showed that 89% of all automatically called variants could also be identified using 15-fold coverage.
Thus, long-read genomes identified 93% of pathogenic variants that are most challenging to detect using short-read technologies. Even with reduced coverage, the vast majority of variants remained detectable, possibly enhancing cost-effective diagnostic implementation. Most importantly, we show the potential to use a single technology to accurately identify all types of clinically relevant variants.
What problem does this paper attempt to address?
The paper attempts to address the issue that in clinical diagnostics, it is difficult to detect certain complex genetic variations using Short Read Sequencing (SRS) technology. Specifically, the research team used High-Fidelity Long-Read Sequencing (HiFi Long-Read Sequencing, LRS) to detect 145 known clinically relevant variations in 100 samples. These variation types include Short Tandem Repeat Expansions (STRs), complex Structural Variants (SVs), Single Nucleotide Variants/Insertion Deletions (SNVs/InDels) inserted in homologous sequences, among others.
By using 30x coverage HiFi genome sequencing data, the study found that the majority of variations (120/145, approximately 83%) could be automatically detected, including about 90% of structural variants, SNVs/InDels in homologous sequences, and expansions of short tandem repeats. An additional 10% (14) of the variations could be confirmed through manual inspection of the sequencing data. The remaining 7% (11) of the variations were not detected, some of which were related to AG-rich repeat expansions. Furthermore, the study showed that even at a reduced coverage of 15x, 89% of the variations could still be detected. Therefore, most variations can still be detected at lower coverage, which may help improve the cost-effectiveness of diagnostics.
In summary, the study demonstrates the potential of long-read sequencing technology in identifying all types of clinically relevant variations and proposes a method to accurately identify various complex variations using a single technology, which may simplify the diagnostic process for rare diseases.