A Real-Time Scheduling Strategy Based on Processing Framework of Hadoop

Fangbing Chen,Ji Liu,Yuesheng Zhu
DOI: https://doi.org/10.1109/bigdatacongress.2017.48
2017-01-01
Abstract:Due to the batch processing capability and distributed storage, MapReduce and HDFS have always been the core parts of Hadoop. Nowadays, many studies still focus on improving and optimizing of the MapReduce task scheduling algorithms. However, in terms of real-time processing, MapReduce task scheduling algorithms cannot perform very well. In this paper, we design a real-time approach based on Hadoop application framework that has practical value in the field of real-time processing. Then, we put forward real-time scheduling algorithms for the file storage layer and computing layer in the framework. The motivation of this paper is to put forward real-time processing algorithms of Hadoop to deal with big data analysis issues. For the file storage layer, we propose a data management algorithm strategy named HDMA. The algorithm considers a variety of factors such as load balancing, data-locality and the heterogeneous properties of each machine within the heterogeneous cluster. HDMA can significantly improve the computing performance of the whole cluster. For the computing layer, we propose a resource dynamic allocation scheduling algorithm based on the length of job named LERDA. LERDA divides various jobs into several levels according to the length of each job and thus avoids short job waiting for a long time. Experiments show that (a) our algorithms increase the data-locality by 15% compared to other algorithms, (b) our algorithms improve the real-time performance of map by approximately 20% in the best case, (c) LERDA can prevent short jobs from suffering starvation effects.
What problem does this paper attempt to address?