Abstract:RDF is one of the most commonly used knowledge representation forms. Many highly influential knowledge bases, such as Freebase and PubChemRDF, are in RDF format. An RDF data set is usually represented as a collection of subject-predicate-object triples. Despite the flexibility of RDF triples, it is challenging to serve SPARQL queries on RDF data efficiently by directly managing triples due to the following two reasons. First, heavy joins on a large number of triples are needed for query processing, resulting in a large number of data scans and large redundant intermediate results; Second, weakly-typed triple representation provides suboptimal random access – typically with logarithmic complexity. This data access challenge, unfortunately, cannot be easily met by a better query optimizer as large graph processing is extremely I/O-intensive. In this paper, we argue that strongly-typed graph representation is the key to high-performance RDF query processing. We propose Stylus – a strongly-typed store for serving massive RDF data. Stylus exploits a strongly-typed storage scheme to boost the performance of RDF query processing. The storage scheme is essentially a materialized join view on entities, it thus can eliminate a large number of unnecessary joins on triples. Moreover, it is equipped with a compact representation for intermediate results and an efficient graphdecomposition based query planner. Experimental results on both synthetic and real-life RDF data sets confirm that the proposed approach can dramatically boost the performance of SPARQL query processing. PVLDB Reference Format: Liang He, Bin Shao, Yatao Li, Huanhuan Xia, Yanghua Xiao, Enhong Chen, Liang Jeff Chen. Stylus: A Strongly-Typed Store for Serving Massive RDF Data. PVLDB, 11(2): xxxx-yyyy, 2018. DOI: 10.14778/3149193.3149200 ∗This work was done in Microsoft Research Asia. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Articles from this volume were invited to present their results at The 44th International Conference on Very Large Data Bases, August 2018, Rio de Janeiro, Brazil. Proceedings of the VLDB Endowment, Vol. 11, No. 2 Copyright 2017 VLDB Endowment 2150-8097/17/10... $ 10.00. DOI: 10.14778/3149193.3149200

Scalable RDF store based on HBase and MapReduce

SparkRDF: Elastic Discreted RDF Graph Processing Engine with Distributed Memory

RCFile: A Fast and Space-Efficient Data Placement Structure in MapReduce-based Warehouse Systems

Rainbow: A Distributed and Hierarchical Rdf Triple Store with Dynamic Scalability

HBaseSpatial: A Scalable Spatial Data Storage Based on HBase

A MapReduce Approach to NoSQL RDF Databases

Stylus: a strongly-typed store for serving massive RDF data

A design space for RDF data representations

Gsmat: A Scalable Sparse Matrix-based Join for SPARQL Query Processing

Message Passing Parallel Inplace Access Processor Stylus Server Coordinator Query Plan Stylus Server Intermediate Result Stylus Server Metadata Stylus Server

Distributed Semantic Web Data Management in HBase and MySQL Cluster

SparkRDF: In-Memory Distributed RDF Management Framework for Large-Scale Social Data.

Graph-Based RDF Data Management

Gstore: Answering Sparql Queries Via Subgraph Matching

A Survey of Distributed RDF Data Management

Redesign of the Gstore System

Gstore: a Graph-Based SPARQL Query Engine

An Efficient and Compact Indexing Scheme for Large-Scale Data Store.

Adaptive Distributed RDF Graph Fragmentation and Allocation Based on Query Workload

Query Workload-based RDF Graph Fragmentation and Allocation

A Stack-Centric Processing Model for Iterative Processing