Extending the Burrows-Wheeler Transform for Cartesian Tree Matching and Constructing It

Eric M. Osterkamp,Dominik Köppl
2024-11-19
Abstract:Cartesian tree matching is a form of generalized pattern matching where a substring of the text matches with the pattern if they share the same Cartesian tree. This form of matching finds application for time series of stock prices and can be of interest for melody matching between musical scores. For the indexing problem, the state-of-the-art data structure is a Burrows-Wheeler transform based solution due to [Kim and Cho, CPM'21], which uses nearly succinct space and can count the number of substrings that Cartesian tree match with a pattern in time linear in the pattern length. The authors address the construction of their data structure with a straight-forward solution that, however, requires pointer-based data structures, which asymptotically need more space than compact solutions [Kim and Cho, CPM'21, Section A.4]. We address this bottleneck by a construction that requires compact space and has a time complexity linear in the product of the text length with some logarithmic terms. Additionally, we can extend this index for indexing multiple circular texts in the spirit of the extended Burrows-Wheeler transform without sacrificing the time and space complexities. We present this index in a dynamic variant, where we pay a logarithmic slowdown and need compact space for the extra functionality that we can incrementally add texts. Our extended setting is of interest for finding repetitive motifs common in the aforementioned applications, independent of offsets and scaling.
Data Structures and Algorithms
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper is mainly dedicated to solving the index construction problems related to **Cartesian tree matching** and extending the Burrows - Wheeler Transform (BWT) to support this kind of matching. Specifically, the paper addresses the following key issues: 1. **Bottleneck problems of existing indexes**: - Although the current state - of - the - art BWT - based index structure (proposed by Kim and Cho) is nearly optimal in terms of space complexity and can calculate the number of substrings that match the pattern Cartesian tree in linear time with respect to the pattern length, its construction algorithm depends on pointer data structures, which require more space than compact solutions. - The paper proposes a new construction algorithm that can complete the index construction in a compact space, and the time complexity is linear in the product of the text length and some logarithmic terms. 2. **Problems of multi - text indexing**: - Existing indexing methods can only partially solve the indexing problem of multiple texts and it is difficult to detect whether a pattern is a repeated fragment of the input text. Especially in application scenarios such as music melody matching, this detection is particularly important. - The paper proposes an extended index that can handle multiple texts. Even if these texts have different offsets or scaling ratios, it can find repeated patterns. 3. **Problems of dynamic indexing**: - The paper also proposes a dynamic variant of the index, which can maintain a compact space requirement when incrementally adding texts and only incurs a logarithmic performance loss. 4. **Specific application problems**: - Cartesian tree matching has important applications in the time - series analysis of stock prices and music melody matching. For example, it can be used to identify common melody patterns without being affected by offsets and scaling. ### Summary By improving and extending the existing BWT index structure, this paper solves the bottlenecks of existing methods in terms of space efficiency and multi - text indexing, thereby improving the efficiency and applicability of Cartesian tree matching. This is not only helpful for time - series data analysis, but also applicable to fields such as music melody matching.