Localized RETE for Incremental Graph Queries

Matthias Barkowsky,Holger Giese
DOI: https://doi.org/10.1007/978-3-031-64285-2_7
2024-07-05
Abstract:Context: The growing size of graph-based modeling artifacts in model-driven engineering calls for techniques that enable efficient execution of graph queries. Incremental approaches based on the RETE algorithm provide an adequate solution in many scenarios, but are generally designed to search for query results over the entire graph. However, in certain situations, a user may only be interested in query results for a subgraph, for instance when a developer is working on a large model of which only a part is loaded into their workspace. In this case, the global execution semantics can result in significant computational overhead. Contribution: To mitigate the outlined shortcoming, in this paper we propose an extension of the RETE approach that enables local, yet fully incremental execution of graph queries, while still guaranteeing completeness of results with respect to the relevant subgraph. Results: We empirically evaluate the presented approach via experiments inspired by a scenario from software development and an independent social network benchmark. The experimental results indicate that the proposed technique can significantly improve performance regarding memory consumption and execution time in favorable cases, but may incur a noticeable linear overhead in unfavorable cases.
Software Engineering
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the performance bottleneck problem encountered when performing graph queries as graph - based models keep growing in model - driven engineering. Specifically, existing incremental graph query techniques (such as those based on the RETE algorithm) usually need to perform a global search on the entire graph to ensure the completeness of results, which will lead to significant computational overhead. However, in some cases, users may only be interested in the query results in sub - graphs. For example, developers only operate on part of a large - scale model loaded into the workspace. In this case, global query execution is not only unnecessary but also brings unnecessary performance overhead. To solve this problem, the paper proposes an extended RETE method, called **Localized RETE**, which can achieve local and fully incremental graph query execution while maintaining the integrity of results. This method avoids the high overhead brought by global query execution by anchoring query execution on relevant sub - graphs and lazy - loading other necessary model elements on demand. ### Main contributions of the paper 1. **Introducing the concept of localized query**: A relaxed definition of query result integrity is proposed, allowing local queries on relevant sub - graphs while ensuring that all matches related to this sub - graph can be found. 2. **Extending the RETE mechanism**: By introducing mark - sensitive RETE nodes and structures, the RETE network can distinguish between local and global queries and retrieve external elements on demand. 3. **Recursive localization process**: A recursive localization process is designed to convert the standard RETE network into a mark - sensitive RETE network, thus supporting localized query execution. 4. **Performance evaluation**: The improvement effects of the proposed Localized RETE method in terms of memory consumption and execution time are verified through experiments. ### Experimental results The paper evaluates the effect of the Localized RETE method through a series of experiments. The experimental results show that, in favorable cases, this method can significantly reduce memory consumption and execution time; but in unfavorable cases, it may generate a certain linear overhead. Overall, the Localized RETE method has obvious performance advantages when dealing with large - scale models, especially when users only focus on part of the model. ### Formula summary - **Definition of graph**: \[ G=(V_G, E_G, s_G, t_G) \] where \(V_G\) is the set of vertices, \(E_G\) is the set of edges, and \(s_G: E_G\rightarrow V_G\) and \(t_G: E_G\rightarrow V_G\) are the mapping functions of source vertices and target vertices respectively. - **Definition of query result integrity**: \[ \forall m\in M_Q^H: (\exists v\in V_Q: m(v)\in V_{H_p})\Rightarrow m\in M \] That is, for the query graph \(Q\) and the host graph \(H\), if a certain match \(m\) involves vertices in the sub - graph \(H_p\), then this match must be included in the result set. - **Mark - sensitive configuration**: \[ C_\Phi: V_N\rightarrow P(M_\Omega\times N) \] where \(N = \mathbb{N}\cup\{\infty\}\), indicating that each intermediate result is marked with \(\phi\). These formulas and definitions help to understand the core ideas and technical details of the Localized RETE method.