Constraint programming versus heuristic approach to MapReduce scheduling problem in Hadoop YARN for energy minimization

Vaibhav Pandey,Poonam Saini
DOI: https://doi.org/10.1007/s11227-020-03516-3
IF: 3.3
2021-01-04
The Journal of Supercomputing
Abstract:In this paper, we consider a deadline-constrained MR scheduling problem of minimizing energy consumption in Hadoop's generic resource manager known as yet another resource negotiator. The problem has been modeled as an integer programming (IP) problem using the time-indexed decision variables. We propose two solution approaches to the problem. First, we give a heuristic algorithm that generates sub-optimal schedules in polynomial time. Second, we propose a novel constraint programming (CP) model (as an alternative to the IP model) which always generates optimal schedules when solved by a CP solver. The CP technique is a relatively new and an alternative approach to IP-based branch-and-cut algorithm to exactly solve NP-hard optimization problems. We performed several experiments to compare both proposed solution approaches over real data traces of a wide variety of MR jobs from the HiBench and PUMA benchmark suite. It is noticed that for large-scale big data jobs, the heuristic algorithm provides sub-optimal results in a very small amount of time. On the other hand, the CP approach not only gives optimal results but also takes a small amount of time when compared to IP-based approaches. Therefore, it can be used in non-time-critical situations for getting an optimal schedule. Besides this, a few experiments were also performed to compare the tightest satisfiable deadline under both approaches with the conclusion that the CP technique is able to produce optimal schedules in tighter deadline constraints than the heuristic approach. Moreover, we investigate the sensitivity of total energy consumption of tasks and the execution time of both approaches separately on the number of tasks and deadlines.
What problem does this paper attempt to address?