A Low-Overhead Cooperative Failure Detector

Jiaxi Liu,Jian Dong,Zhibo Wu,Jin Wu,Jinghui Lan,Jiaxin Yu
DOI: https://doi.org/10.1109/imccc.2015.177
2015-01-01
Abstract:Failure detectors are one of the fundamental components for ensuring the high availability of large scale distributed system. The increasing popularity and demand for the large scale distributed system came with an increase in the overhead and complexity of failure detection that posed a challenge obstructing further development. In order to solve the challenge, this paper proposes a new failure detector-S-AFD which combines adaptive failure detection based on QoS (quality of service) and cooperative mechanism that share negative messages among different active nodes. It does not only reduce the detection overhead, but also adapt the various network conditions. Through analysis of experiments, it is shown that the performance of S-AFD has a clearly improvement compared with the traditional implementations of failure detectors.
What problem does this paper attempt to address?