Mpdbs: A Multi-Level Parallel Database System Based on B-Tree
Lei Yu,Ge Fu,Yan Jin,Xiaojia Xiang,Huaiyuan Tan,Hong Zhang,Xinran Liu,Xiaobo Zhu
DOI: https://doi.org/10.1109/snpd.2015.7176228
2015-01-01
Abstract:Parallel processing system has been extensively developed and used in numerous commercial servers for large-scale data analysis. However, the issues of scalability, reliability and efficiency cannot be achieved simultaneously. Motivated by this observation, a Multi-level Parallel Database System based on B-tree structure (MPDBS) is designed for large-scale structured data and semi-structured data. Correspondingly, a multi-level index scheme (MLIS) is proposed in this paper. Based on MPDBS framework and MLIS scheme, the system can parallel execute analyzing task and full-text query efficiently, meanwhile reducing the network I/O and disk I/O greatly. The optimal architecture of MPDBS is also derived by mathematical approach. Experimental results show that, given the same hardware configuration and TPC-H benchmark, comparing with Hive using Hadoop Distributed File System (HDFS), the query (i.e., statistical query, keyword query and point query) latency on 200GB commercial data for the proposed MPDBS is declined by 95%.