Improving the Performance of Hypervisor-Based Fault Tolerance

Jun Zhu,Wei Dong,Zhefu Jiang,Xiaogang Shi,Zhen Xiao,Xiaoming Li
DOI: https://doi.org/10.1109/ipdps.2010.5470357
2010-01-01
Abstract:Hypervisor-based fault tolerance (HBFT), a checkpoint-recovery mechanism, is an emerging approach to sustaining mission-critical applications. Based on virtualization technology, HBFT provides an economic and transparent solution. However, the advantages currently come at the cost of substantial overhead during failure-free, especially for memory intensive applications. This paper presents an in-depth examination of HBFT and options to improve its performance. Based on the behavior of memory accesses among checkpointing epochs, we introduce two optimizations, read fault reduction and write fault prediction, for the memory tracking mechanism. These two optimizations improve the mechanism by 31.1% and 21.4% respectively for some application. Then, we present softwaresuperpage which efficiently maps large memory regions between virtual machines (VM). By the above optimizations, HBFT is improved by a factor of 1.4 to 2.2 and it achieves a performance which is about 60% of that of the native VM.
What problem does this paper attempt to address?