A New Approach Of Failure Detection For Large-Scale Distributed Systems

Xiangzhan Yu,Xiaochun Yuri
2006-01-01
Abstract:Large-scale failure detection is at the heart of modern failure detection. Inspired largely by the analysis of issues of current approaches to large-scale failure detection, we propose a novel solution to failure detection through a hierarchical failure detection methodology. This proposed method combines the characteristics of both hierarchical protocol and gossip-style protocol. In particular, it divides the nodes on the large-scale distributed network into groups. Within the same group, the direct detection mode is adopted; among the different groups, however, the gossip protocol is applied to make all the nodes involved in failure detection. The main advantage is that it reduces the detecting time and decreases the network load. This research work involves the comprehensive testing and analysis of this hierarchical approach, through which the qualities of timeliness, accuracy, adaptability and extensibility pertain to our aimed requirements of large-scale failure detection.
What problem does this paper attempt to address?