A Cooperative Checkpointing Algorithm with Message Complexity O(n)

WANG Dong-Sheng,SHAO Ming-Long
2003-01-01
Abstract:The technology of cooperative checkpointing and rollback recovery as an effective method of fault tolerance, has been widely used on the parallel or distributed computer systems, such as cluster of computers. In order to reduce the overhead of time and space, a cooperative checkpointing algorithm based on message counting is given in this paper. While reducing a message complexity during synchronization from O(n2) to O(n), improving
What problem does this paper attempt to address?