ScalaRDF: A Distributed, Elastic and Scalable In-Memory RDF Triple Store

Chunming Hu,Xixu Wang,Renyu Yang,Tianyu Wo
DOI: https://doi.org/10.1109/icpads.2016.0084
2016-01-01
Abstract:The Resource Description Framework (RDF) andSPARQL query language are gaining increasing popularity andacceptance. The ever-increasing RDF data has reached a billionscale of triples, resulting in the proliferation of distributed RDFstore systems within the Semantic Web community. However, theelasticity and performance issues are still far from settled inface of data volume explosion and workload spike. In addition, providers face great pressures to provision uninterrupted reliablestorage service whilst reducing the operational costs due to avariety of system failures. Therefore, how to efficiently realizesystem fault tolerance remains an intractable problem. In this paper, we introduce ScalaRDF, a distributed and elastic in-memoryRDF triple store to provision a fault-tolerant and scalable RDFstore and query mechanism. Specifically, we describe a consistenthashing protocol that optimizes the RDF data placement, dataoperations (especially for online RDF triple update operations)and achieves an autonomously elastic data re-distribution in theevent of cluster node joining or departing, avoiding the holisticoscillation of data storage. In addition, the data store is ableto realize a rapid and transparent failover through replicationmechanism which stores in-memory data replica in the next hashhop. The experiments demonstrate that query time and updatetime are reduced by 87% and 90% respectively compared to otherapproaches. For an 18G source dataset, the data redistributiontakes at most 60 seconds when system scales out and at most 100seconds for recovery when nodes undergo crash-stop failures.
What problem does this paper attempt to address?