An Optimized Speculative Execution Strategy Based on Local Data Prediction in a Heterogeneous Hadoop Environment

Xiaodong Liu,Qi Liu
DOI: https://doi.org/10.1109/cse-euc.2017.208
2017-01-01
Abstract:Hadoop is a famous distributed computing framework that is applied to process large-scale data. "Straggling tasks" have a serious impact on Hadoop performance due to imbalance of slow tasks distribution. Speculative execution (SE) presents a way to deal with Straggling tasks by monitoring the real-time progress of running tasks and replicating potential "Stragglers" on another node to increase the opportunity of completing backup tasks ahead of original. Current proposed SE strategies meet their challenges such as misjudgment of "Straggling tasks", improper selection of backup nodes, etc., which result in inefficient performance of the SE and its Hadoop system. In this paper, we propose an optimized SE strategy based on local data prediction, which collects task execution information in real time and uses Locally Weighted Regression (LWR) to predict remaining time of each running tasks, and selects an appropriate backup task node according to the actual requirements. It also combines a cost-benefit model to maximize the effectiveness of SE. According to the results, the proposed SE strategy implemented in Hadoop-2.6.0 enhances the accuracy of selecting potential Straggler task candidates, and shows better performance in various situations in a heterogeneous Hadoop environment.
What problem does this paper attempt to address?