SPARQL Query Parallel Processing: A Survey
Jiaying Feng,Chenhong Meng,Jiaming Song,Xiaowang Zhang,Zhiyong Feng,Lei Zou
DOI: https://doi.org/10.1109/bigdatacongress.2017.65
2017-01-01
Abstract:In this paper, we survey current parallel processing approaches for SPARQL queries in RDF data. This survey focuses on three important aspects of current 15 SPARQL query engines in performance, namely, system architectures, RDF data storage management, and SPARQL query executive strategies. We classify those 15 engines as three classes of architectures (i.e., cluster, partition, and federation), three kinds of storage (i.e., partition, graph, and DBMS) and two query executive strategies (i.e., partition and DBMS), respectively. This investigation shows that each aspect (architecture, storage, and query) is a key factor of SPARQL query performance. For instance, in data storage, we evaluate the performance of three latest SPARQL query engines over the LUBM benchmark with different storage, namely, TriAD (based on partition), gStoreD (based on graph), S2RDF (based on DBMS) and then the experiments show that TriAD outperforms the other two engines. On the other hand, S2RDF can support Spark which is proved to be an efficient parallel computing architecture. We wonder to know whether there is some partition-based storage like to support Spark since the partition-based storage of TriAD exhibits a high performance. To answer this question, it is interesting to investigate the relations among the three factors of SPARQL query parallel processing. In this sense, our survey would be helpful in inspiring some new ideas in developing more efficient SPARQL query engines.