eHotSnap: An Efficient and Hot Distributed Snapshots System for Virtual Machine Cluster

Bo Li,Lei Cui,Zhiyu Hao,Lun Li,Yongji Liu,Yongnan Li
DOI: https://doi.org/10.1109/tpds.2023.3272014
IF: 5.3
2023-01-01
IEEE Transactions on Parallel and Distributed Systems
Abstract:With the popularity of IaaS clouds, many distributed and networked applications are running in virtual machine cluster (VMC). The distributed snapshots of VMC are a practical approach to guarantee system reliability. It rewinds the system to an intermediate state from failures so that the applications can continue execution from a point near the failure. However, the applications running in the VMC suffer from long disruption and significant performance degradation due to the heavy cost distributed snapshots, especially when designed to guarantee global consistency of VMC snapshots. This paper presents eHotSnap, which takes distributed snapshots of a VMC efficiently. eHotSnap divides the native snapshot into light cost transient snapshot and heavy cost memory snapshot and then coordinates the VM snapshots immediately after transient snapshots. In this way, it decouples coordination from heavy cost snapshots so that the distributed snapshots are taken (completed in logic) within a second. Then, it performs memory snapshot and optimizes it with a two-layer optimization, which first employs de-duplication to reduce the amount of snapshot data and then leverages priority queue to serve guest write operations preferentially. In addition to presenting eHotSnap, we have implemented a prototype on QEMU/KVM. The experimental results demonstrate the effectiveness and efficiency of the proposed approach.
computer science, theory & methods,engineering, electrical & electronic
What problem does this paper attempt to address?