SHelp: Automatic Self-Healing for Multiple Application Instances in a Virtual Machine Environment.

Gang Chen,Hai Jin,Deqing Zou,Bing,Weizhong Qiang,Gang Hu
DOI: https://doi.org/10.1109/cluster.2010.18
2010-01-01
Abstract:When multiple instances of an application running on multiple virtual machines, an interesting problem is how to utilize the fault handling result from one application instance to heal the same fault occurred on other sibling instances, and hence to ensure high service availability in a cloud computing environment. This paper presents SHelp, a lightweight runtime system that can survive software failures in the framework of virtual machines. It applies weighted rescue points and error virtualization techniques to effectively make applications by-pass the faulty path. A two-level storage hierarchy is adopted in the rescue point database for applications running on different virtual machines to share error handling information to reduce the redundancy and to more effectively and quickly recover from future faults caused by the same bugs. A Linux prototype is implemented and evaluated using four web server applications that contain various types of bugs. Our experimental results show that SHelp can make server applications to recover from these bugs in just a few seconds with modest performance overhead.
What problem does this paper attempt to address?