Research and application of fault tolerance framework for grid service based on OGSA

Ping Kuang,Hai Jin,Pingpeng Yuan,Hanhua Chen
DOI: https://doi.org/10.3321/j.issn:1671-4512.2005.z1.008
2005-01-01
Abstract:Open grid services architecture (OGSA) requires scalable and flexible fault tolerant mechanisms to address different requirements such as support for diverse failure handling strategies and separating failure handling strategies from application codes. We propose a hierarchical framework for fault tolerance based on the OGSA. In this framework, diverse strategies and mechanisms can be flexibly configured to meet different requirements of fault tolerance of services. Implementations of several mechanisms are presented and described, including service instance hot standby in instance pool and service recovery based on checkpointing etc. This framework and the mechanisms are successfully implemented in HUSTgrid. The experiment results in CoGIS, which is an application deployed in HUSTgrid platform with multi-fault-tolerance mechanisms requirements shows that our fault tolerance framework and the related mechanisms work efficiently.
What problem does this paper attempt to address?