A reliability task scheduling algorithm with optimizing makespan in heterogeneous systems

Jing Wei-Peng,Wu Zhi-Bo,Liu Hong-Wei,Jian Dong
2012-01-01
Abstract:Fault tolerance and the makespan (or the schedule length) are important requirements in several distributed heterogeneous systems. In this paper we propose a fault tolerant scheduling heuristics for precedence task which is based on primary-backup replication scheme. We focus on a bi-criteria approach, where we aim at minimizing makespan, and the other way take into account the failure probability of the application. We are able to let the user choose a trade-off between reliability maximization and makespan minimization. Major achievements include a low complexity and reduction of the number of additional communications included by the replication and clustering mechanism. Simulation results show that compared with existing scheduling algorithms in the literature, our scheduling algorithm improves the reliability and performance. © 2012 TSI Press.
What problem does this paper attempt to address?