Linear indexing for all strings under all internal nodes in suffix trees

Anas Al-okaily,Abdelghani Tbakhi
DOI: https://doi.org/10.1101/2021.10.25.465764
2024-01-13
Abstract:Suffix trees are fundamental data structure in stringology. In this work, we introduce two algorithms that index all strings/suffixes under all internal nodes in suffix tree in linear time and space. These indexes can contribute in resolving several strings problems such as DNA sequence analysis and approximate pattern matching problems.
Bioinformatics
What problem does this paper attempt to address?
This paper attempts to address the problem of indexing all substrings under internal nodes in a suffix tree with linear time and space complexity. Specifically, the paper proposes two algorithms to achieve this goal: 1. **Indexing all substrings under internal nodes**: These indexes can be used to solve some string problems more efficiently, such as DNA sequence analysis and approximate pattern matching problems. 2. **Introducing the OSHR tree structure**: This structure is used to assist the indexing process, ensuring that the same substring is not indexed multiple times during traversal. By addressing these issues, the study aims to improve the efficiency of processing large amounts of string data in fields such as bioinformatics.