MotifScope: a multi-sample motif discovery and visualization tool for tandem repeats

Yaran Zhang,Marc Hulsman,Alex Salazar,Niccolò Tesi,Lydian Knoop,Sven van der Lee,Sanduni Wijesekera,Jana Krizova,Erik-Jan Kamsteeg,Henne Holstege
DOI: https://doi.org/10.1101/2024.03.06.583591
2024-03-11
Abstract:Tandem repeats (TRs) constitute a significant portion of the human genome, exhibiting high levels of polymorphism due to variations in size and motif composition. These variations have been associated with various neuropathological disorders, underscoring the clinical importance of TRs. Furthermore, the motif structure of these repeats can offer valuable insights into evolutionary dynamics and population structure. However, analysis of TRs has been hampered by the limitations of short-read sequencing technology, which lacks the ability to fully capture the complexity of these sequences. With long-read data becoming more accessible, there is now also a need for tools to explore and characterize these TRs. In this study, we introduce MotifScope, a novel algorithm for visualization of TRs in their population context based on a de novo k-mer approach for motif discovery. Comparative analysis against three established tools, uTR, TRF, and vamos, reveals that MotifScope can identify a greater number of motifs and more accurately represent the actual repeat sequence. Additionally, MotifScope enables comparison of sequencing reads within an individual and assemblies across different individuals, showing its applicability in diverse genomic contexts. We demonstrate potential applications of MotifScope in diverse fields, including population genetics, clinical settings, and forensic analyses.
Genomics
What problem does this paper attempt to address?