TIIS: A Two-Level Inverted-Index Scheme for Large-Scale Data Processing in the Parallel Database System

Lei Yu,Ge Fu,Huaiyuan Tan,Yan Jin,Hong Zhang,Xinran Liu,Xiaojia Xiang
DOI: https://doi.org/10.1109/mec.2013.6885464
2013-01-01
Abstract:Based on Service-Oriented Architecture, an inexpensive solution, Parallel database middleware gather the standalone database instance to provide users with highly scalable relational data management platform. However, with the advent of the era of large-scale data, such platform has posed a serious challenge in the context of text data retrieval. Motivated by this observation, a parallel database middleware based on semi-structure data is firstly designed to support text retrieval. Then, a two-level inverted-index scheme called TIIS is designed for full-text query. The advantages of TIIS are that it can quickly locate the result data from large cluster distributed database storing large-scale data, and it can greatly reduce the network I/O and disk I/O. Experimental results show that, comparing with Hive using Hadoop Distributed File System in same environment of hardware, our system performs typical TPC-H data analysis, consuming of full-text query is declined by 90% on 2GB commercial data in average.
What problem does this paper attempt to address?