Research of Scheduling Strategy Based on Fault Tolerance in Hadoop Platform

zhengwu yuan,jinli wang
DOI: https://doi.org/10.1007/978-3-642-41908-9_52
2013-01-01
Abstract:The scheduling problem is a hot issue in the current cloud computing, and node failure conditions should be taken into consideration. Firstly the disadvantages of current Hadoop task scheduling algorithms and fault tolerance in Hadoop platform are discussed in this paper. Then a scheduling strategy based on fault tolerance is presented. According to this strategy, the cluster detects the speed of the current nodes in a cluster, and makes some backups of the intermediate MapReduce data results to a high-performance cache server, and the data is produced by the node that may go wrong soon. Thus the cluster may resume the execution to the previous level quickly when there are several nodes going wrong, the Reduce nodes read the Map output from the cache server or both of the cache and the node, and keeps its high performance. Finally the computer simulation is done. It shows that the strategy presented in this paper is effective under failure tolerance conditions.
What problem does this paper attempt to address?