An elastic distributed parallel Hadoop system for bigdata platform and distributed inference engines

Dong Ho Song,Ji Ae Shin,Yean Jin In,Wan Gon Lee,Kang Se Lee
DOI: https://doi.org/10.7465/jkdi.2015.26.5.1129
2015-09-30
Journal of the Korean Data and Information Science Society
Abstract:Inference process generates additional triples from knowledge represented in RDF triples of semantic web technology. Tens of million of triples as an initial big data and the additionally inferred triples become a knowledge base for applications such as QA(question&answer) system. The inference engine requires more computing resources to process the triples generated while inferencing. The additional computing resources supplied by underlying resource pool in cloud computing can shorten the execution time. This paper addresses an algorithm to allocate the number of computing nodes "elastically" at runtime on Hadoop, depending on the size of knowledge data fed. The model proposed in this paper is composed of the layered architecture: the top layer for applications, the middle layer for distributed parallel inference engine to process the triples, and lower layer for elastic Hadoop and server visualization. System algorithms and test data are analyzed and discussed in this paper. The model hast the benefit that rich legacy Hadoop applications can be run faster on this system without any modification.
What problem does this paper attempt to address?