Elastic Reliability Optimization Through Peer-to-Peer Checkpointing in Cloud Computing.
Juzi Zhao,Yu Xiang,Tian Lan,H. Howie Huang,Suresh Subramaniam
DOI: https://doi.org/10.1109/tpds.2016.2571281
IF: 5.3
2016-01-01
IEEE Transactions on Parallel and Distributed Systems
Abstract:Modern day data centers coordinate hundreds of thousands of heterogeneous tasks and aim at delivering highly reliable cloud computing services. Although offering equal reliability to all users benefits everyone at the same time, users may find such an approach either inadequate or too expensive to fit their individual requirements, which may vary dramatically. In this paper, we propose a novel method for providing elastic reliability optimization in cloud computing. Our scheme makes use of peer-to-peer checkpointing and allows user reliability levels to be jointly optimized based on an assessment of their individual requirements and total available resources in the data center. We show that the joint optimization can be efficiently solved by a distributed algorithm using dual decomposition. The solution improves resource utilization and presents an additional source of revenue to data center operators. Our validation results suggest a significant improvement of reliability over existing schemes.
What problem does this paper attempt to address?