Self-adaptive Failure Detector for Peer-to-Peer Distributed System Considering the Link Faults
Yanzhang He,Xiaohong Jiang,Changbo Dai,Zikun Fan
DOI: https://doi.org/10.1007/978-3-319-67952-5_6
2017-01-01
Abstract:Nowadays, the distributed computing is prevailing in artificial intelligence applications due to the limited computation capacity of single computing node. Generally, distributed computing system contains large scale of computing node, and therefore system breakdown is regarded as usual matter. To enhance the system availability and performance, failure detection dominates important status to recover the system. The traditional failure detector simply equates the link fault with the node fault problem, which greatly affects the resource utilization, fault locating and fast repair. We present a self-adaptive Link-based Failure Detection Agreement DLFDA with an improved node fault detection algorithm, which can accurately distinguish the node fault and link fault. DLFDA can dynamically adjust the detection structure to increase the coverage of the link fault detection, while using Gossip protocol to distribute fault diagnosis results to other system members, which extensively reduces the damage of the system performance. Finally, the experimental results show that our method can meet the requirements of theoretical design.