MapReduce-based data aggregation algorithms

Leng Fangling,Bao Yubin,Gao Wei,Yu Ge
DOI: https://doi.org/10.3969/j.issn.2095-2783.2011.07.001
2011-01-01
Abstract:To improve the computing efficiency of massive data in data warehouses,aggregation computing is one of the most typical data pre-processing methods.But it requires enormous computing power and storage capacity.So a set of MapReduce-based aggregation algorithms for massive data are proposed,mainly including data selection,projection and equivalent joint,etc.And the counting,summing,and averaging operations are implemented.They make a family of aggregation operation algorithms.Experiments show that the algorithms make full use of the cluster computing power and storage capacity,thus greatly improving the efficiency of the aggregation operations,and enhancing the query efficiency on massive data based on the aggregation results.
What problem does this paper attempt to address?