Average Size of a Suffix Tree for Markov Sources

Philippe Jacquet,Wojciech Szpankowski
DOI: https://doi.org/10.48550/arXiv.1605.02123
2016-05-07
Abstract:We study a suffix tree built from a sequence generated by a Markovian source. Such sources are more realistic probabilistic models for text generation, data compression, molecular applications, and so forth. We prove that the average size of such a suffix tree is asymptotically equivalent to the average size of a trie built over $n$ independent sequences from the same Markovian source. This equivalence is only known for memoryless sources. We then derive a formula for the size of a trie under Markovian model to complete the analysis for suffix trees. We accomplish our goal by applying some novel techniques of analytic combinatorics on words also known as analytic pattern matching.
Data Structures and Algorithms
What problem does this paper attempt to address?