On the design of reactive approach with flexible checkpoint interval to tolerate faults in cloud computing systems

Mohammed Amoon,Nirmeen El-Bahnasawy,Samy Sadi,Manar Wagdi
DOI: https://doi.org/10.1007/s12652-018-1139-y
IF: 3.662
2018-11-15
Journal of Ambient Intelligence and Humanized Computing
Abstract:The likelihood of failures rises in cloud computing systems as a result of their unstable nature. Additionally, the size of a cloud computing system varies with time and thus failures become a common incident. Failures have a high impact on cloud performance and the expected benefits for both customers and providers. Fault tolerance is an essential challenge facing cloud providers in order to mitigate the effects of failures and maintaining the Service Level Agreement (SLA) satisfied. Checkpointing is one of the most known reactive fault tolerance techniques used in distributed computing. However, it can incur considerable overheads that depend on the interval of the checkpoint applied and these overheads put down the performance of the cloud. In this paper, a reactive fault tolerance approach in the context of checkpointing is proposed and evaluated with the aim of getting better performance. The approach depends on applying a flexible interval of the checkpoint to reduce overheads. Simulation experiments indicate superior performance of the approach in terms of power consumption, response time, monetary cost and cloud capacity.
computer science, information systems,telecommunications, artificial intelligence
What problem does this paper attempt to address?