LuBase: A Search-Efficient Hybrid Storage System for Massive Text Data

Debin Jia,Zhengwei Liu,Xiaoyan Gu,Bo Li,Jingzi Gu,Weiping Wang,Dan Meng
DOI: https://doi.org/10.1007/978-3-319-27122-4_10
2015-01-01
Abstract:Recent years have witnessed a great deal of enthusiasm devoting to big data analytics systems, some of them, with the property of high scalability and fault tolerance, are extensively used in real productions. However, such systems are mostly designed for processing immutable data stored in HDFS, not suitable for real-time text data in NoSQL database like HBase. In this paper, we propose a search-efficient hybrid storage system termed LuBase for large-scale text data analytics scenarios. Not just a novel hybrid storage system with fine-grained index, LuBase also presents a new query process flow which can fully employ pre-built full-text index to accelerate the execution of interactive queries and achieve more efficient I/O performance at the same time. We implemented LuBase in a data analytics system based on Impala. Experimental results demonstrate that LuBase can reap huge fruits from Lucene index technique and bring significant performance improvement for Impala when querying HBase.
What problem does this paper attempt to address?