Abstract:With the explosive growth of semantic data on the Web over the past years, many large-scale RDF knowledge bases with billions of facts are generating. This poses significant challenges for the storage and query of big RDF graphs. Current systems still have many limitations in processing big RDF graphs including scalability and real-time. In this paper, we introduce the SparkRDF, an elastic discreted RDF graph processing engine with distributed memory. To reduce the high I/O and communication cost in distributed processing platforms, SparkRDF implements SPARQL query based on Spark, a novel in-memory distributed computing framework for big data processing. All the intermediate results are modeled as Resilient Discreted SubGraph, which are cached in the distributed memory to support fast iterative join operations. To cut down the search space and avoid the overhead of memory, we split the RDF graph into the small Multi-layer Elastic SubGraph based on the relations and classes. For SPARQL query optimization, SparkRDF deploys a serials of optimization strategies, leading to effective reduction on the size of intermediate results, the number of joins and the cost of communication. Our extensive evaluation demonstrates that SparkRDF can efficiently implement non-selective joins faster than both current state-of-the-art distributed and centralized stores, while being able to process other queries in real time, scaling linearly to the amount of data.

Implementation and Optimization of RDF Query Using Hadoop.

Optimization of RDF data storage and query based on Hadoop

HadoopRDF: a scalable semantic data analytical engine

Efficient SPARQL Query Evaluation Via Automatic Data Partitioning.

RDF partitioning for scalable SPARQL query processing

Scalable RDF store based on HBase and MapReduce

HadoopRDF : A Scalable RDF Data Analysis System

SparkRDF: Elastic Discreted RDF Graph Processing Engine with Distributed Memory

Towards Efficient SPARQL Query Processing on RDF Data

Scalable RDF Graph Querying Using Cloud Computing

Fast Processing SPARQL Queries on Large RDF Data

Scalable SAPRQL querying processing on large RDF data in cloud computing environment

Efficient SPARQL query processing in mapreduce through data partitioning and indexing

Effective And Efficient Keyword Query Interpretation Using A Hybrid Graph

Query optimization for massively parallel data processing.

Efficient SPARQL Query Evaluation in a Database Cluster

An Approach to RDF(S) Query, Manipulation and Inference on Databases

A Method of Semantic Web Data Division and Parallel Loading Based on OWL

Towards Efficient Distributed SPARQL Queries on Linked Data.

A partition-based Summary-Graph-Driven Method for Efficient RDF Query Processing

Survey of RDF Query Processing Techniques