Automatic reconfiguration of COW

Youhui Zhang,Dongsheng Wang,Weiming Zheng
2000-01-01
Tien Tzu Hsueh Pao/Acta Electronica Sinica
Abstract:Cluster of Workstation (COW) now becomes one of the leading technologies in the parallel processing field. To implement the COW with high availability, it is necessary to research its system reconfiguration technique. The paper first describes the reconfiguration model of COW, the checkpoint setting and rollback recovery mechanism and the fault detection. On this basis, we introduce ChaRM system, a Checkpointing-based Rollback Recovery and Migration System, which is implemented by authors, and bring forward the main design traits to achieve the high availability system.
What problem does this paper attempt to address?