Research on heartbeat detection protocol in faulty processes and faulty links

Jian Dong,Hongwei Liu,Decheng Zuo,Xiaozong Yang
2006-01-01
WSEAS TRANSACTIONS ON COMPUTERS
Abstract:Heartbeat detection is one of the most important methods of fault detection in distributed system. A heartbeat protocol allows two processes to detect the states of each other by exchanging messages periodically. But, a simple binary heartbeat can not distinguish between faulty process and faulty link, which induces the disagreement of detection results. This paper presents a heartbeat protocol basing on multiple master-nodes (HPMM), it can immediately and accurately detect and locate faulty components by adopting voting and electing mechanism among master-nodes. Thus, HPMM solves the problem of the disagreement in detection results, and also improves the continuous work time as well as the availability of the system. In addition, the detection costs can be reduced by distributing workload into multiple master-nodes.
What problem does this paper attempt to address?