Improved Data Partitioning for Building Large ROLAP Data Cubes in Parallel

Ying Chen,Frank K. H. A. Dehne,Todd Eavis,Andrew Rau-Chaplin
DOI: https://doi.org/10.4018/jdwm.2006010101
2006-01-01
International Journal of Data Warehousing and Mining
Abstract:This paper presents an improved parallel method for generating ROLAP data cubes on a shared-nothing multiprocessor based on a novel optimized data partitioning technique. Since no shared disk is required, our method can be used for highly scalable processor clusters consisting of standard PCs with local disks only, connected via a data switch. Experiments show that our improved parallel method provides optimal, linear, speedup for at least 32 processors. The approach taken, which uses a ROLAP representation of the data cube, is well suited for large data warehouses and high dimensional data, and supports the generation of both fully materialized and partially materialized data cubes.
What problem does this paper attempt to address?