Communication-Based Prevention of Non-P-Pattern

Yi-Wei Ci,Zhan Zhang,De-Cheng Zuo,Zhi-Bo Wu,Xiao-Zong Yang
DOI: https://doi.org/10.1109/srds.2009.29
2009-01-01
Abstract:An issue pertinent to the design of checkpointing protocols is how to improve the autonomy of checkpointing and keep computation loss under control. To address the problem, a time-based multi-cycle checkpointing protocol is proposed in this paper. In this protocol, processes are allowed to take checkpoints with desired checkpoint cycles. To enable recent checkpoints to be used to form a consistent global checkpoint, a communication-based checkpoint cycle adjustment approach is also proposed. In this approach, the checkpoint cycle adjustment of each process follows a P-pattern. Simulation results show that the rollback deviation of the proposed protocol can be well controlled under a low checkpointing overhead.
What problem does this paper attempt to address?