Disconnected Agreement in Networks Prone to Link Failures

Bogdan S. Chlebus,Dariusz R. Kowalski,Jan Olkowski,Jedrzej Olkowski
2023-06-26
Abstract:We consider deterministic distributed algorithms for reaching agreement in synchronous networks of arbitrary topologies. Links are bi-directional and prone to failures while nodes stay non-faulty at all times. A faulty link may omit messages. Agreement among nodes is understood as holding in each connected component of a network obtained by removing faulty links. We call ``disconnected agreement'' the algorithmic problem of reaching such agreement. We introduce the concept of stretch, which is the number of connected components of a network, obtained by removing faulty links, minus~$1$ plus the sum of diameters of connected components. We define the concepts of ``fast'' and ``early-stopping'' algorithms for disconnected agreement by referring to stretch. A network has $n$ nodes and $m$ links. Nodes are normally assumed to know their own names and ability to associate communication with local ports. If we additionally assume that a bound~$\Lambda$ on stretch is known to all nodes, then there is an algorithm for disconnected agreement working in time $O(\Lambda)$ using messages of $O(\log n)$ bits. We give a general disconnected agreement algorithm operating in~$n+1$ rounds that uses messages of $O(\log n)$ bits. Let~$\lambda$ be an unknown stretch occurring in an execution; we give an algorithm working in time~$(\lambda+2)^3$ and using messages of $O(n\log n)$ bits. We show that disconnected agreement can be solved in the optimal $O(\lambda)$ time, but at the cost of increasing message size to~$O(m\log n)$. We also design an algorithm that uses only~$O(n)$ non-faulty links and works in time~$O(n m)$, while nodes start with their ports mapped to neighbors and messages carry $O(m\log n)$ bits. We prove lower bounds on the performance of disconnected-agreement solutions that refer to the parameters of evolving network topologies and the knowledge available to nodes.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?