Parallel Data Warehouses Architecture Based on PC Cluster

Jin-guo YOU,Jian-qing XI,Yu-hong XIAO
DOI: https://doi.org/10.3969/j.issn.1000-3428.2009.20.025
2009-01-01
Abstract:As data warehouses grow in size,how to assuring the performance of answering Ad Hoc queries on massive data becomes a big challenge.To address the issue,this paper proposes a parallel data warehouse architecture,HDW,built upon PC cluster.It employs Google's GFS,Bigtable to process the distributive storage management and MapReduce to parallelize OLAP computation tasks.In addition,it provides the XMLA interface for front-end applications.Experimental results conducted on an 18-node cluster show that HDW scales well and can process large data sets with at least 10 million tuples.
What problem does this paper attempt to address?