STRchive: a dynamic resource detailing population-level and locus-specific insights at tandem repeat disease loci

Laurel Hiatt,Ben Weisburd,Egor Dolzhenko,Grace E. VanNoy,Edibe Nehir Kurtas,Heidi L. Rehm,Aaron Quinlan,Harriet Dashnow
DOI: https://doi.org/10.1101/2024.05.21.24307682
2024-05-21
Abstract:Approximately 3% of the human genome consists of repetitive elements called tandem repeats (TRs), which include short tandem repeats (STRs) of 1-6bp motifs and variable number tandem repeats (VNTRs) of 7+bp motifs. TR variants contribute to several dozen mono- and polygenic diseases but remain understudied and 'enigmatic,' particularly relative to single nucleotide variants. It remains comparatively challenging to interpret the clinical significance of TR variants. Although existing resources provide portions of necessary data for interpretation at disease-associated loci, it is currently difficult or impossible to efficiently invoke the additional details critical to proper interpretation, such as motif pathogenicity, disease penetrance, and age of onset distributions. It is also often unclear how to apply population information to analyses. We present STRchive (S-T-archive, http://strchive.org/), a dynamic resource consolidating information on TR disease loci in humans from research literature, up-to-date clinical resources, and large-scale genomic databases, with the goal of streamlining TR variant interpretation at disease-associated loci. We apply STRchive-including pathogenic thresholds, motif classification, and clinical phenotypes-to a gnomAD cohort of ~18.5k individuals genotyped at 60 disease-associated loci. Through detailed literature curation, we demonstrate that the majority of TR diseases affect children despite being thought of as adult diseases. Additionally, we show that pathogenic genotypes can be found within gnomAD which do not necessarily overlap with known disease prevalence, and leverage STRchive to interpret locus-specific findings therein. We apply a diagnostic blueprint empowered by STRchive to relevant clinical vignettes, highlighting possible pitfalls in TR variant interpretation. As a living resource, STRchive is maintained by experts, takes community contributions, and will evolve as understanding of TR diseases progresses.
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is how to more effectively explain and interpret the clinical significance of gene variations related to tandem repeats (TRs), especially those loci related to diseases. Specifically, the paper aims to: 1. **Integrate and provide detailed TR disease - locus information**: By creating a dynamic resource library named STRchive, it centrally integrates information from research literature, the latest clinical resources, and large - scale genomic databases to help researchers and clinicians better understand the genetic background of TR diseases. 2. **Simplify the TR variation interpretation process**: STRchive not only provides the reference regions of TR diseases, typical repeat sequence motifs, but also contains key information such as the normal and pathogenic allele ranges, thus simplifying the clinical interpretation process of TR variations. 3. **Reveal the impact of TR diseases in children**: The paper shows that most TR diseases can actually affect children, which challenges the previous view that TR diseases mainly affect adults. Through the data analysis of about 18,500 individuals in the gnomAD cohort, it is found that many TR diseases may have manifested as early as in the early life. 4. **Provide diagnostic guidelines**: STRchive provides a blueprint for the diagnosis of TR variations, including evaluating allele size, sequence composition, phenotypic characteristics, and detailed information of specific loci, to guide clinicians to perform accurate variation interpretation. 5. **Promote the research and development of TR diseases**: As a "living" resource maintained by experts and accepting community contributions, STRchive will be continuously updated and improved with the progress of the understanding of TR diseases, promoting the research progress in related fields. In summary, the core objective of this paper is to improve the understanding and interpretation ability of TR diseases and their related variations through the establishment of the comprehensive platform of STRchive, and then improve the diagnosis and treatment plans of patients.