Efficient Skew Handling in Online Aggregation in the Cloud

Xiang Ci,Fengming Wang,Yantao Gan,Xiaofeng Meng
DOI: https://doi.org/10.1109/icdew.2016.7495608
2016-01-01
Abstract:As the development of social network, mobile Internet, etc., an increasing amount of data are being generated, which beyonds the processing ability of traditional data management tools. In many real-life applications, users can accept approximate answers accompanied by accuracy guarantees. One of the most commonly used approaches is online aggregation. Online aggregation responds aggregation queries against the random samples and refines the result as more samples are received. In the era of big data, since more and more data analysis applications are migrated to the cloud, online aggregation in the cloud has also drawn more attention. The problem of data skew can greatly impact the results of online aggregation in the cloud. In fact, there exist two special types of data skew in online aggregation in the cloud. In this paper, we propose two methods to deal with the two types of data skew respectively. We implement our methods in a cloud online aggregation system called COLA and the experimental results demonstrate our methods can remarkably eliminate negative effect of data skew and get better results.
What problem does this paper attempt to address?