A decentralized fault tolerance model based on level of performance for grid environment

Mohammed Rebbah,Yahya Slimani,Abdelkader Benyettou,Lionel Brunie
DOI: https://doi.org/10.1007/s10586-015-0497-x
2015-10-17
Cluster Computing
Abstract:Computational grids have the potential for solving large-scale scientific problems using heterogeneous and geographically distributed resources. At this scale, computer resources and network failures are no more exceptions, but belong to the normal system behavior. Therefore, one of the most valuable characteristics of grid tools, apart from the performance they can achieve, is fault tolerance, which is a significant and complex issue in grid computing systems. In this paper, we propose a fault tolerant model for grid computing systems namely DCFT. This model is based on dynamic colored graphs without replication of computer resources. The proposed faut tolerance model consists of two stages. In the first stage, each node is described by a state vector. We color each attribute of the state vector as three colors (green, blue and red) based on its level of performance. In the second stage, we classify the nodes of a grid into three categories: the identical computer resources in term of performance, the more efficient ones and the less efficient ones. We used the colors of the nodes to develop a new strategy for fault tolerance based on the level of performance. A simulation of the proposed model using SimGrid simulator and Graphstream is conducted. Experimental results show that the proposed model performs very well in a large grid environment.
What problem does this paper attempt to address?