NASTRA: Accurate analysis of short tandem repeat markers by nanopore sequencing with repeat-structure-aware algorithm

Zilin Ren,Jiarong Zhang,Yixiang Zhang,Tingting Yang,Pingping Sun,Jiguo Xue,Xiaochen Bo,Bo Zhou,Jiangwei Yan,Ming Ni
DOI: https://doi.org/10.1101/2023.11.04.565630
2024-03-30
Abstract:Forensic short-tandem repeats (STR) genetic markers are multi-allelic and widely utilized for individual identification, kinship testing, and cell-line authentication. Nanopore sequencing, known for its portability, is emerging as a promising approach for STR typing, facilitating real-time and in-field testing. However, its efficacy is often hampered by sequencing noise. Previous methods rely on alignment-based genotyping, necessitating known alleles, which limits their applicability to unknown alleles. Here, we introduced NASTRA, an innovative allele reference-free tool for precise germline analysis of STR genetic markers. NASTRA incorporates a recursive algorithm to infer repeat structures of allele sequences using only known repeat motifs. Our tests, conducted on 80 individual samples and 8 DNA standards, have demonstrated NASTRA’s exceptional 100% accuracy in genotyping nearly all diploid STRs across various multiplex kits and flow cells. It surpasses alignment-based methods in accuracy and speed. In a paternity testing case study, NASTRA accurately identified three relationships among six individuals within an 18-minute sequencing duration. These results underscore NASTRA’s ability to perform STR analysis on both NGS and nanopore sequencing platforms, significantly enhancing the utility of nanopore sequencing in relevant applications.
Bioinformatics
What problem does this paper attempt to address?