Lightweight Virtual Machine Checkpoint and Rollback for Long-running Applications.
Lei Cui,Zhiyu Hao,Lun Li,Haiqiang Fei,Zhenquan Ding,Bo Li,Peng Liu
DOI: https://doi.org/10.1007/978-3-319-27137-8_42
2015-01-01
Abstract:Checkpoint/rollback is an effective approach to guarantee that the long-running applications can be completed in the face of failures. However, it does not come for free. The application suffers from long downtime and performance penalty when it is being checkpointed or rolled back, which result in extra overhead on application execution time. This problem would get worse in virtualized environment mainly due to the heavyweight of virtual machine. This paper proposes warmCR, a lightweight checkpoint/rollback system for virtual machine, which aims to reduce its own extra overhead on application execution time. First, warmCR employs the redirect-on-write approach to create disk checkpoint and leverages the copy-on-write method to lively create memory checkpoint, so that both the downtime and checkpoint duration are reduced. Second, we propose a working set based rollback approach to provide short downtime without compromising application performance. Third, workload-aware batched processing is proposed to achieve trade-off between downtime and performance loss. In addition to presenting warmCR, we detail its implementation, and provide extensive experimental results to prove its efficiency and effectiveness.