Leon: A Distributed Rdf Engine For Multi-Query Processing

Xintong Guo,Hong Gao,Zhaonian Zou
DOI: https://doi.org/10.1007/978-3-030-18576-3_44
2019-01-01
Abstract:As similar queries keep springing up in real query logs, few RDF systems address this problem. In this paper, we propose Leon, a distributed RDF system, which can also deal with multi-query problem. First, we apply a characteristic-set-based partitioning scheme. This scheme (i) supports the fully parallel processing of join within characteristic sets; (ii) minimizes data communication by applying direct transmission of intermediate results instead of broadcasting. Then, Leon revisits the classical problem of multi-query optimization in the context of RDF/SPARQL. In light of the NP-hardness of the multi-query optimization for SPARQL, we propose a heuristic algorithm that partitions the input batch of queries into groups, and discover the common sub-query of multiple SPARQL queries. Our MQO algorithm incorporates with a subtle cost model to generate execution plans.Our experiments with synthetic and real datasets verify that: (i) Leon's startup overhead is low; (ii) Leon consistently outperforms centralized RDF engines by 1-2 orders of magnitude, and it is competitive with state-of-the-art distributed RDF engines; (iii) Our MQO approach consistently demonstrates 10x speedup over the baseline method.
What problem does this paper attempt to address?