Efficient SPARQL query processing in mapreduce through data partitioning and indexing

Zhi Nie,Fang Du,Yueguo Chen,Xiaoyong Du,Linhao Xu
DOI: https://doi.org/10.1007/978-3-642-29253-8_58
2012-01-01
Abstract:Processing SPARQL queries on single node is obviously not scalable, considering the rapid growth of RDF knowledge bases. This calls for scalable solutions of SPARQL query processing over Web-scale RDF data. There have been attempts for applying SPARQL query processing techniques in MapReduce environments. However, no study has been conducted on finding optimal partitioning and indexing schemes for distributing RDF data in MapReduce. In this paper, we investigate RDF data partitioning technique that provides effective indexing schemes to support efficient SPARQL query processing in MapReduce. Our extensive experiments over a huge real-life RDF dataset show the performance of the proposed partitioning and indexing schemes for efficient SPARQL query processing.
What problem does this paper attempt to address?