Rollback Recovery in Distributed Systems Using Loosely Synchronized Clocks

Z. Tong,Richard Y. Kain,Wei‐Tek Tsai
DOI: https://doi.org/10.1109/71.127264
IF: 5.3
1992-01-01
IEEE Transactions on Parallel and Distributed Systems
Abstract:A rollback recovery scheme for distributed systems is proposed. The state-save synchronization among processes is implemented by bounding clock drifts such that no state-save synchronization messages are required. Since the clocks are only loosely synchronized, the synchronization overhead can be negligible in many applications. An interprocess communication protocol which encodes state-save progress information within message frames is introduced to checkpoint consistent system states. A rollback recovery algorithm that will force a minimum number of nodes to roll back after failures is developed.< >
What problem does this paper attempt to address?