Abstract:As the volume of the RDF data becomes increasingly large, it is essential for us to design a distributed database system to manage it. For distributed RDF data design, it is quite common to partition the RDF data into some parts, called fragments, which are then distributed. Thus, the distribution design consists of two steps: fragmentation and allocation. In this paper, we propose a method to explore the intrinsic similarities among the structures of queries in a workload for fragmentation and allocation, which aims to reduce the number of crossing matches and the communication cost during SPARQL query processing. Specifically, we mine and select some frequent access patterns to reflect the characteristics of the workload. Here, although we prove that selecting the optimal set of frequent access patterns is NP-hard, we propose a heuristic algorithm which guarantees both the data integrity and the approximation ratio. Based on the selected frequent access patterns, we propose two fragmentation strategies, vertical and horizontal fragmentation strategies, to divide RDF graphs while meeting different kinds of query processing objectives. Vertical fragmentation is for better throughput and horizontal fragmentation is for better performance. After fragmentation, we discuss how to allocate these fragments to various sites. Finally, we discuss how to process a query based on the results of fragmentation and allocation. Extensive experiments confirm the superior performance of our proposed solutions.

Storage, Indexing, Query Processing, and Benchmarking in Centralized and Distributed RDF Engines: A Survey

Survey of RDF Stores & SPARQL Engines for Querying Knowledge Graphs

SparkRDF: Elastic Discreted RDF Graph Processing Engine with Distributed Memory

SPARQL Query Parallel Processing: A Survey

Towards Efficient SPARQL Query Processing on RDF Data

A Survey of Distributed RDF Data Management

Processing SPARQL Queries over Linked Data-A Distributed Graph-based Approach.

Leon: A Distributed Rdf Engine For Multi-Query Processing

A partition-based Summary-Graph-Driven Method for Efficient RDF Query Processing

New Distributed Spatial Query Optimization Approach by Using Query Analyzer

Query Workload-based RDF Graph Fragmentation and Allocation

A Pattern-Based Approach for Efficient Query Processing over RDF Data

An Improved Distributed Query for Large-Scale RDF Data

Processing SPARQL Queries over Distributed RDF Graphs

An Interest-based P2P RDF Query Architecture

An empirical evaluation of cost-based federated SPARQL query processing engines

An Approach to RDF(S) Query, Manipulation and Inference on Databases

Adaptive Distributed RDF Graph Fragmentation and Allocation Based on Query Workload

Gstore: a Graph-Based SPARQL Query Engine

Efficient query evaluation techniques over large amount of distributed linked data

Gsmat: A Scalable Sparse Matrix-based Join for SPARQL Query Processing