Abstract:As RDF data continue to gain popularity, we witness the fast growing trend of RDF datasets in both the number of RDF repositories and the size of RDF datasets. Many known RDF datasets contain billions of RDF triples (subject, predicate and object). One of the grant challenges for managing these huge RDF data is how to execute RDF queries efficiently. In this paper, we address the query processing problems against the billion triple challenges. We first identify some causes for the problems of existing query optimization schemes, such as large intermediate results, initial query cost estimation errors. Then, we present our block-oriented dynamic query plan generation approach powered with pipelining execution. Our approach consists of two phases. In the first phase, a near-optimal execution plan for queries is chosen by identifying the processing blocks of queries. We group the join patterns sharing a join variable into building blocks of the query plan since executing them first provides opportunities to reduce the size of intermediate results generated. In the second phase, we further optimize the initial pipelining for a given query plan. We employ optimization techniques, such as sideways information passing and semi-join, to further reduce the size of intermediate results, improve the query processing cost estimation and speed up the performance of query execution. Experimental results on several RDF datasets of over a billion triples demonstrate that our approach outperforms existing RDF query engines that rely on dynamic programming based static query processing strategies.

Fast Processing SPARQL Queries on Large RDF Data

Dynamic and Fast Processing of Queries on Large-Scale RDF Data

Grace: An Efficient Parallel SPARQL Query System over Large-Scale RDF Data

Implementation of large-scale distributed information retrieval system

SparkRDF: Elastic Discreted RDF Graph Processing Engine with Distributed Memory

Towards Efficient SPARQL Query Processing on RDF Data

TripleBit: a fast and compact system for large scale RDF data

Efficient SPARQL Query Evaluation Via Automatic Data Partitioning.

SPARQL Query Parallel Processing: A Survey

Query Optimization for Massive RDF Data Based on Spark

Scalable RDF store based on HBase and MapReduce

Efficient SPARQL Query Evaluation in a Database Cluster

RDF partitioning for scalable SPARQL query processing

HadoopRDF: a scalable semantic data analytical engine

Scalable SPARQL Querying Using Path Partitioning

Scalable RDF Graph Querying Using Cloud Computing

Implementation and Optimization of RDF Query Using Hadoop.

Scalable SAPRQL querying processing on large RDF data in cloud computing environment

A Distributed Graph Engine for Web Scale RDF Data

High Performance RDF Updates with TripleBit +.

Processing SPARQL Queries over Distributed RDF Graphs