Research and Application of Fault-Tolerance Based on Watershed Model Grid Platform

Wang Zhijian,Ling Shang,Feng Xu
DOI: https://doi.org/10.1109/CSSE.2008.444
2008-01-01
Abstract:A systematic scheme to form the watershed computational platform was developed based on lightweight Grid technique in this paper. The scheme that takes advantage of widely deployed local network makes full use of the non dedicated distributed computing resources. To overcome the vagary of overall system, MPICH-T a trust model based fault tolerant model was adopted, and the checkpoint based on pessimistic log can ensure that process repeats in single node and task migration on multi-nodes. and the transplant of system is guaranteed on the watershed model Grid platform, lastly several experiments were made on this platform and the results show that this platform has better performance though has a slightly time delay and the fault-tolerance mechanism based on MPICHT model is a nice choice suiting to the watershed model Grid platform.
What problem does this paper attempt to address?