Asynchronous recovery protocols for distributed systems

Hwang, K.W.,Tsai, W.T.
DOI: https://doi.org/10.1109/CMPSAC.1988.17232
1988-01-01
Abstract:The authors address the problem of error recovery in a system of distributed communication processes. They show that if each process can detect its local computation errors while establishing the recovery points, then the amount of process dependencies can be reduced by exploiting the temporal ordering of message communication among the processes. The proposed approach allows processes to proceed independently during normal computation, and can be further improved to accommodate independent rollback without explicit coordination. The authors also discuss the handling of messages that are originated from, or received by, tasks that later abort. Simulation studies indicate that the approach taken achieves a much higher throughput than the synchronous approach.<>
What problem does this paper attempt to address?