Abstract:In the relational database realm, there has been a shift towards novel hybrid database architectures combining the properties of transaction processing (OLTP) and analytical processing (OLAP). OLTP workloads are made up by read and write operations on a small number of rows and are typically addressed by indexes such as B+trees. On the other side, OLAP workloads consists of big read operations that scan larger parts of the dataset. To address both workloads some databases introduced an architecture using a buffer or delta partition. Precisely, changes are accumulated in a write-optimized delta partition while the rest of the data is compressed in the read-optimized main partition. Periodically, the delta storage is merged in the main partition. In this paper we investigate for the first time how this architecture can be implemented and behaves for RDF graphs. We describe in detail the indexing-structures one can use for each partition, the merge process as well as the transactional management. We study the performances of our triple store, which we call qEndpoint, over two popular benchmarks, the Berlin SPARQL Benchmark (BSBM) and the recent Wikidata Benchmark (WDBench). We are also studying how it compares against other public Wikidata endpoints. This allows us to study the behavior of the triple store for different workloads, as well as the scalability over large RDF graphs. The results show that, compared to the baselines, our triple store allows for improved indexing times, better response time for some queries, higher insert and delete rates, and low disk and memory footprints, making it ideal to store and serve large Knowledge Graphs.

A MapReduce Approach to NoSQL RDF Databases

Scalable RDF store based on HBase and MapReduce

RCFile: A Fast and Space-Efficient Data Placement Structure in MapReduce-based Warehouse Systems

Distributed data management using MapReduce

SparkRDF: Elastic Discreted RDF Graph Processing Engine with Distributed Memory

Rainbow: A Distributed and Hierarchical Rdf Triple Store with Dynamic Scalability

A design space for RDF data representations

qEndpoint: A novel triple store architecture for large RDF graphs

Efficiently querying rdf data in triple stores.

Distributed Semantic Web Data Management in HBase and MySQL Cluster

Evaluating NoSQL Databases for OLAP Workloads: A Benchmarking Study of MongoDB, Redis, Kudu and ArangoDB

A Practice Of Tpc-Ds Multidimensional Implementation On Nosql Database Systems

Query optimization for massively parallel data processing.

HBaseSpatial: A Scalable Spatial Data Storage Based on HBase

The performance of MapReduce: an in-depth study

Visualization Analysis of NoSQL Research Field Based on SCI by CiteSpace V

The Performance of MapReduce

An Approach in Big Data Analytics to Improve the Velocity of Unstructured Data Using MapReduce

A Subject Partitioning Based SPARQL Query Engine and Its NoSQL Implementation.

Graph-Based RDF Data Management

A Survey on Geographically Distributed Big-Data Processing using MapReduce