Estore: An Effective Optimized Data Placement Structure For Hive

Li Xin,Li Hui,Huang Zhihao,Zhu Bing,Cai Jiawei
DOI: https://doi.org/10.1109/BigData.2016.7840952
2016-01-01
Abstract:The data warehouse system Hive has emerged as an important facility for supporting data computing and storage. In particular, RCFile is a tailor-made data placement structure implemented in Hive, which is designed for the data processing efficiency. In this paper, we propose several optimized schemes based on RCFile and introduce EStore, which is an optimized data placement structure that is able to improve the query rate and reduce storage space for Hive. Specifically, it adopts both row-store and column-store in blocks, and further classifies the columns by the frequency of each table-column. Moreover, we also employ the classic RDP code to store files of the data table. We conduct experiments on a real cluster, and the results show that EStore has better features in terms of data query rate and storage space compared with RCFile.
What problem does this paper attempt to address?