OPTIMA: On-Line Partitioning Skew Mitigation for MapReduce with Resource Adjustment.

Zhihong Liu,Qi Zhang,Raouf Boutaba,Yaping Liu,Baosheng Wang
DOI: https://doi.org/10.1007/s10922-015-9362-8
2016-01-01
Journal of Network and Systems Management
Abstract:Partitioning skew has been shown to be a major issue that can significantly prolong the execution time of MapReduce jobs. Most of the existing off-line heuristics for partitioning skew mitigation are inefficient; they have to wait for the completion of all the map tasks. Some solutions can tackle this problem on-line, but will impose an additional overhead by repartitioning the workload of overloaded tasks. In this paper, we present OPTIMA, an on-line partitioning skew mitigation technique for MapReduce. OPTIMA predicts the workload distribution of reduce tasks at run-time, leverages the deviation detection technique to identify the overloaded tasks and pro-actively adjusts resource allocation for these tasks to reduce their execution time. We provide the upper bound of OPTIMA in time complexity, while allowing OPTIMA to perform totally on-line. Through experiments using both real and synthetic workloads running on an 11-node Hadoop cluster, we have observed OPTIMA can effectively mitigate the partitioning skew and improved the job completion time by up to 36.73 % in our experiments.
What problem does this paper attempt to address?