Single-Source Regular Path Querying in Terms of Linear Algebra

Georgiy Belyanin,Semyon Grigoriev
2024-12-14
Abstract:A given edge-labelled graph two-way regular path queries (2-RPQs) allow one to use regular languages over labelled edges and inverted edges to constraint paths of interest. 2-RPQs are (partially) adopted in different real-world graph analysis systems and are a part of the GQL ISO standard. But the performance of 2-RPQs on real-world graphs is still a bottleneck for wider adoption. A new single-source 2-RPQ algorithm based on linear algebra is proposed. Utilization of high-performance sparse linear algebra libraries for the algorithm implementation allows one to achieve significant speedup over competitors on real-world data and queries. Our implementation demonstrates better performance on average on Wikidata and the respective query log in comparison with MillenniumDB, FalkorDB, and the algorithm of Diego Arroyuelo et al.
Data Structures and Algorithms
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **Improve the performance of single - source two - way regular path queries (2 - RPQs) on large - scale real - world graph data**. Specifically, the authors propose a new linear - algebra - based algorithm (LA 2 - RPQ) to accelerate the computation of single - source 2 - RPQs. This algorithm is implemented by using efficient sparse linear - algebra libraries (such as SuiteSparse:GraphBLAS) and can significantly outperform existing methods in terms of performance on real - data and queries. ### Main contributions of the paper 1. **Design and proof of the new algorithm**: - Propose a single - source 2 - RPQ algorithm (LA 2 - RPQ) inspired by breadth - first search (BFS), which is based on linear - algebra operations, especially operations on sparse Boolean matrices. - Prove the correctness of this algorithm. 2. **Performance evaluation**: - Implement this algorithm using SuiteSparse:GraphBLAS and evaluate it on real - world datasets (such as Wikidata). - Compare its performance with existing linear - algebra baseline solutions (such as FalkorDB, RPQ - matrix) and the state - of - the - art graph database MillenniumDB. - The results show that the LA 2 - RPQ algorithm is 3.9 to 19.6 times faster than its competitors on most queries, and all queries can be completed within a 1 - minute time limit, while other competing solutions sometimes time out. ### Background and motivation - **Importance of 2 - RPQ**: 2 - RPQ allows using regular - language - constrained paths in graphs and is widely used in graph database systems (such as Cypher, PGQL) and the ISO standard GQL. - **Performance bottleneck**: Although 2 - RPQ has been intensively studied theoretically, in practical applications, its performance is still a bottleneck, especially when dealing with large - scale graph data. ### Technical details - **Linear - algebra representation**: Transform the operations of graphs and automata into linear - algebra operations of matrices and vectors, and use sparse - matrix libraries for efficient computation. - **Parallelization**: Use modern high - performance - computing libraries (such as SuiteSparse:GraphBLAS) to achieve parallelization, thereby improving computational efficiency. ### Future work - Explore the application of known optimization techniques, such as rare - label utilization or push - pull optimization. - Research the linear - algebra expression of multi - source BFS and its potential improvement to the algorithm. - Explore the impact of GPU acceleration and distributed computing on performance. In conclusion, this paper aims to significantly improve the query performance of single - source 2 - RPQ on large - scale graph data by introducing a new linear - algebra - based algorithm.