Dynamic direct access of MSO query evaluation over strings

Pierre Bourhis,Florent Capelli,Stefan Mengel,Cristian Riveros
2024-09-26
Abstract:We study the problem of evaluating a Monadic Second Order (MSO) query over strings under updates in the setting of direct access. We present an algorithm that, given an MSO query with first-order free variables represented by an unambiguous variable-set automaton $\mathcal{A}$ with state set $Q$ and variables $X$ and a string $s$, computes a data structure in time $\mathcal{O}(|Q|^\omega\cdot |X|^2 \cdot |s|)$ and, then, given an index $i$ retrieves, using the data structure, the $i$-th output of the evaluation of $\mathcal{A}$ over $s$ in time $\mathcal{O}(|Q|^\omega \cdot |X|^3 \cdot \log(|s|)^2)$ where $\omega$ is the exponent for matrix multiplication. Ours is the first efficient direct access algorithm for MSO query evaluation over strings; such algorithms so far had only been studied for first-order queries and conjunctive queries over relational data. Our algorithm gives the answers in lexicographic order where, in contrast to the setting of conjunctive queries, the order between variables can be freely chosen by the user without degrading the runtime. Moreover, our data structure can be updated efficiently after changes to the input string, allowing more powerful updates than in the enumeration literature, e.g.~efficient deletion of substrings, concatenation and splitting of strings, and cut-and-paste operations. Our approach combines a matrix representation of MSO queries and a novel data structure for dynamic word problems over semi-groups which yields an overall algorithm that is elegant and easy to formulate.
Databases,Data Structures and Algorithms,Formal Languages and Automata Theory
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **the problem of dynamic direct - access evaluation of Monadic Second Order (MSO) queries on strings**. Specifically, the author has studied how to perform efficient direct - access (direct access) of MSO queries on strings in an update setting and proposed an algorithm to solve this problem. ### Detailed Explanation: 1. **Background and Motivation**: - The goal of the direct - access algorithm is to represent query answers in a compact and efficient manner without explicitly storing the query results and to allow for fast access to these answers like an array. - This method is usually divided into two phases: the pre - processing phase and the access phase. In the pre - processing phase, a data structure is calculated so that the query results can be quickly retrieved by index in the access phase. 2. **Existing Work and Challenges**: - Direct - access algorithms have been widely studied in first - order logic queries and conjunctive queries on relational data. - However, there has been less research on direct - access algorithms for Monadic Second Order (MSO) logic queries, although MSO queries have been widely studied in other contexts (such as enumeration algorithms). 3. **Main Contributions of the Paper**: - A new direct - access algorithm is proposed, which can efficiently evaluate MSO queries with free first - order variables on strings. - The time complexity of this algorithm is: - Pre - processing time: \( O(|Q|^\omega \cdot |X|^2 \cdot |s|) \) - Access time: \( O(|Q|^\omega \cdot |X|^3 \cdot \log^2(|s|)) \) - Here, \( |Q| \) is the size of the state set, \( |X| \) is the size of the variable set, \( |s| \) is the length of the string, and \( \omega \) is the exponent of matrix multiplication. 4. **Innovative Points**: - **Support for Arbitrary Word Order**: Unlike conjunctive queries, the user can freely choose the order of variables without affecting the running time. - **Efficient Update**: This algorithm supports efficient update operations on the input string, such as deleting substrings, concatenating and splitting strings, and cutting and pasting operations. - **Concise Data Structure**: Combining matrix representation and a new type of semigroup dynamic word - problem data structure makes the overall algorithm elegant and easy to express. 5. **Technical Details**: - By reducing the direct - access problem to a counting problem and using binary search to find the answer to be accessed. - Using matrix multiplication to express the counting problem and ensuring that the result of the matrix product can be efficiently maintained after certain matrix replacements. - Using an extended binary search tree to store the matrix product so that the data structure remains efficient during updates. In conclusion, this paper proposes a novel direct - access algorithm, which solves the problem of efficient evaluation of MSO queries on strings and supports efficient dynamic update operations. This result fills the gaps in existing research and provides new ideas and methods for future research.