Quantitative Fault-Tolerance for Reliable Workflows on Heterogeneous IaaS Clouds

Guoqi Xie,Gang Zeng,Renfa Li,Keqin Li
DOI: https://doi.org/10.1109/tcc.2017.2780098
IF: 5.697
2017-01-01
IEEE Transactions on Cloud Computing
Abstract:Reliability requirement is one of the most important quality of services (QoS) and should be satisfied for a reliable workflow in cloud computing. Primary-backup replication is an important software fault-tolerant technique used to satisfy reliability requirement. Recent works studied quantitative fault-tolerant scheduling to reduce execution cost by minimizing the number of replicas while satisfying the reliability requirement of a workflow on heterogeneous infrastructure as a service (IaaS) clouds. However, a minimum number of replicas does not necessarily lead to the minimum execution cost and shortest schedule length in a heterogeneous IaaS cloud. In this study, we propose the quantitative fault-tolerant scheduling algorithms QFEC and QFEC+ with minimum execution costs and QFSL and QFSL+ with shortest schedule lengths while satisfing the reliability requirements of workflows. Extensive experimental results show that (1) compared with the state-of-the-art algorithms, the proposed algorithms achieve less execution cost and shorter schedule length, although the number of replicas are not minimum; (2) QFEC and QFEC+ are designed to reduce execution cost, and QFEC+ is better than QFEC for all low-parallelism and high-parallelism workflows; and (3) QFSL and QFSL+ are designed to decrease schedule length, and QFSL+ is better than QFSL for all low-parallelism and high-parallelism workflows.
What problem does this paper attempt to address?