Dynamic Checkpointing Policy in Heterogeneous Real-Time Standby Systems

Gregory Levitin,Liudong Xing,Yuanshun Dai,Vinod M. Vokkarane
DOI: https://doi.org/10.1109/tc.2017.2667659
IF: 3.183
2017-01-01
IEEE Transactions on Computers
Abstract:This paper models 1-out-of-N standby computing systems with a dynamic checkpointing policy. The system performs a real-time mission task that has to be accomplished within an allowed mission time. During the mission, to facilitate an effective failure recovery the system undergoes checkpointing procedures according to a policy that dynamically determines a checkpointing frequency based on the activated element and remaining work for completing the mission. System elements are heterogeneous; they can follow different, arbitrary types of time-to-failure distributions, have different performance and wait in different standby modes before their activation. A new numerical algorithm based on state space event transitions is first proposed to evaluate mission success probability of the real-time standby systems considered in this work. Additional new contributions are made by formulating and solving optimal dynamic checkpointing policy problems, as well as an integrated optimization problem that finds the optimal combination of checkpointing policy and element activation sequence maximizing mission success probability. Advantages of using the dynamic checkpointing policy over fixed even checkpoints are demonstrated through examples. Examples and results are also provided to illustrate effects of different mission and element parameters on mission success probability as well as on the optimal dynamic checkpointing policy.
What problem does this paper attempt to address?