Abstract:RDF is one of the most commonly used knowledge representation forms. Many highly influential knowledge bases, such as Freebase and PubChemRDF, are in RDF format. An RDF data set is usually represented as a collection of subject-predicate-object triples. Despite the flexibility of RDF triples, it is challenging to serve SPARQL queries on RDF data efficiently by directly managing triples due to the following two reasons. First, heavy joins on a large number of triples are needed for query processing, resulting in a large number of data scans and large redundant intermediate results; Second, weakly-typed triple representation provides suboptimal random access – typically with logarithmic complexity. This data access challenge, unfortunately, cannot be easily met by a better query optimizer as large graph processing is extremely I/O-intensive. In this paper, we argue that strongly-typed graph representation is the key to high-performance RDF query processing. We propose Stylus – a strongly-typed store for serving massive RDF data. Stylus exploits a strongly-typed storage scheme to boost the performance of RDF query processing. The storage scheme is essentially a materialized join view on entities, it thus can eliminate a large number of unnecessary joins on triples. Moreover, it is equipped with a compact representation for intermediate results and an efficient graphdecomposition based query planner. Experimental results on both synthetic and real-life RDF data sets confirm that the proposed approach can dramatically boost the performance of SPARQL query processing. PVLDB Reference Format: Liang He, Bin Shao, Yatao Li, Huanhuan Xia, Yanghua Xiao, Enhong Chen, Liang Jeff Chen. Stylus: A Strongly-Typed Store for Serving Massive RDF Data. PVLDB, 11(2): xxxx-yyyy, 2018. DOI: 10.14778/3149193.3149200 ∗This work was done in Microsoft Research Asia. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Articles from this volume were invited to present their results at The 44th International Conference on Very Large Data Bases, August 2018, Rio de Janeiro, Brazil. Proceedings of the VLDB Endowment, Vol. 11, No. 2 Copyright 2017 VLDB Endowment 2150-8097/17/10... $ 10.00. DOI: 10.14778/3149193.3149200

ScalaRDF: A Distributed, Elastic and Scalable In-Memory RDF Triple Store

SparkRDF: Elastic Discreted RDF Graph Processing Engine with Distributed Memory

Scalable RDF store based on HBase and MapReduce

Rainbow: A Distributed and Hierarchical Rdf Triple Store with Dynamic Scalability

SparkRDF: In-Memory Distributed RDF Management Framework for Large-Scale Social Data.

A Request Skew Aware Heterogeneous Distributed Storage System Based on Cassandra

A design space for RDF data representations

Efficiently querying rdf data in triple stores.

A MapReduce Approach to NoSQL RDF Databases

Adaptive Distributed RDF Graph Fragmentation and Allocation Based on Query Workload

Redesign of the Gstore System

Gstore: Answering Sparql Queries Via Subgraph Matching

Efficient Indices Using Graph Partitioning in RDF Triple Stores

Analyzing workload trends for boosting triple stores performance

A Survey of Distributed RDF Data Management

Stylus: a strongly-typed store for serving massive RDF data

DIAERESIS: RDF data partitioning and query processing on SPARK

Query Workload-based RDF Graph Fragmentation and Allocation

Message Passing Parallel Inplace Access Processor Stylus Server Coordinator Query Plan Stylus Server Intermediate Result Stylus Server Metadata Stylus Server

Efficient Distributed Query Processing in Large RFID-enabled Supply Chains

A graph-based RDF triple store