Online algorithms for finding distinct substrings with length and multiple prefix and suffix conditions
Laurentius Leonard,Shunsuke Inenaga,Hideo Bannai,Takuya Mieno
DOI: https://doi.org/10.48550/arXiv.2207.04194
2022-10-30
Abstract:Let two static sequences of strings $P$ and $S$, representing prefix and suffix conditions respectively, be given as input for preprocessing. For the query, let two positive integers $k_1$ and $k_2$ be given, as well as a string $T$ given in an online manner, such that $T_i$ represents the length-$i$ prefix of $T$ for $1 \leq i \leq |T|$. In this paper we are interested in computing the set $\mathit{ans_i}$ of distinct substrings $w$ of $T_i$ such that $k_1 \leq |w| \leq k_2$, and $w$ contains some $p \in P$ as a prefix and some $s \in S$ as a suffix. More specifically, the counting problem is to output $|\mathit{ans_i}|$, whereas the reporting problem is to output all elements of $\mathit{ans_i}$, for each iteration $i$. Let $\sigma$ denote the alphabet size, and for a sequence of strings $A$, $\Vert A\Vert=\sum_{u\in A}|u|$. Then, we show that after $O((\Vert P\Vert +\Vert S\Vert)\log\sigma)$-time preprocessing, the solutions for the counting and reporting problems for each iteration up to $i$ can be output in $O(|T_i| \log\sigma)$ and $O(|T_i| \log\sigma + |\mathit{ans_i}|)$ total time. The preprocessing time can be reduced to $O(\Vert P\Vert +\Vert S\Vert)$ for integer alphabets of size polynomial with regard to $\Vert P\Vert +\Vert S\Vert$. Our algorithms have possible applications to network traffic classification.
Data Structures and Algorithms