Abstract:With the explosive growth of semantic data on the Web over the past years, many large-scale RDF knowledge bases with billions of facts are generating. This poses significant challenges for the storage and query of big RDF graphs. Current systems still have many limitations in processing big RDF graphs including scalability and real-time. In this paper, we introduce the SparkRDF, an elastic discreted RDF graph processing engine with distributed memory. To reduce the high I/O and communication cost in distributed processing platforms, SparkRDF implements SPARQL query based on Spark, a novel in-memory distributed computing framework for big data processing. All the intermediate results are modeled as Resilient Discreted SubGraph, which are cached in the distributed memory to support fast iterative join operations. To cut down the search space and avoid the overhead of memory, we split the RDF graph into the small Multi-layer Elastic SubGraph based on the relations and classes. For SPARQL query optimization, SparkRDF deploys a serials of optimization strategies, leading to effective reduction on the size of intermediate results, the number of joins and the cost of communication. Our extensive evaluation demonstrates that SparkRDF can efficiently implement non-selective joins faster than both current state-of-the-art distributed and centralized stores, while being able to process other queries in real time, scaling linearly to the amount of data.

A distributed architecture for rule engine to deal with big data

Distributed Affinity Propagation Clustering Based on MapReduce

A Distributed Rule Engine for Streaming Big Data

A MapReduce-Based Architecture for Rule Matching in Production System

Distributed High-Dimension Matrix Operation Optimization on Spark

A Novel Approach to Distributed Rule Matching and Multiple Firing Based on MapReduce

An Improved Rete Algorithm Based on Double Hash Filter and Node Indexing for Distributed Rule Engine

Research on Rule Matching Model Based on Spark

A parallel approximate rule extracting algorithm based on the improved discernibility matrix

An efficient MapReduce-based rule matching method for production system

A rule-based decision subsystem design approach for intelligent robot

Robot Simulation and Reinforcement Learning Training Platform Based on Distributed Architecture.

Evaluating Large Graph Processing in MapReduce Based on Message Passing

Parallel Processing of Sensor Data in a Distributed Rules Engine Environment through Clustering and Data Flow Reconfiguration

Improving Rete algorithm to enhance performance of rule engine systems

Distributed data management using MapReduce

SparkRDF: Elastic Discreted RDF Graph Processing Engine with Distributed Memory

A Study of Design with Spatial Rule-Based Engine Using GeoSpatial Big-Data