A hybrid fault tolerance framework for SaaS services based on hidden Markov model

Feng Ye,Qian Huang,Zhijian Wang,ng Li
DOI: https://doi.org/10.1504/IJRS.2019.097022
2019-10-05
International Journal of Reliability and Safety
Abstract:With the booming of cloud computing, more and more applications adopt cloud services to implement their critical business. However, failures causing either service downtime or producing invalid results in such applications may range from a mere inconvenience to significant monetary penalties or even loss of human lives. In critical systems, making the cloud services highly dependable is one of the main challenges. Existing researches show that using fault injection for experimental assessment of fault tolerance architecture for cloud services is still an open problem because of the complexity and diversity of failures in cloud environment. Therefore, we propose a hybrid fault tolerance framework which utilises replication and design diversity techniques for SaaS service. In order to verify the effectiveness of the fault tolerance framework in various pragmatic failure scenarios, a mixed fault simulator based on urn and ball model in hidden Markov model is introduced. A series of experiments are carried out for evaluating the reliability of the SaaS service, including single service without replication, single service with retry or reboot, and a service with spatial replication. The results show that the mixed fault simulator is flexible for simulating various faults in cloud environment, and both temporal and spatial redundancy have better effect on the availability and reliability improvement of the SaaS service.
What problem does this paper attempt to address?