SIMULATING THE RELIABILITY OF DISTRIBUTED SYSTEMS WITH UNRELIABLE NODES

ZHONGSHI HE,YUFANG TIAN,YINONG CHEN
2002-01-01
Abstract:This paper discusses the reliability of distributed systems in which nodes may fail with certain probabilities. The distributed systems have been modeled by a probabilistic graph G. We focus on the communication reliability that is characterized by a particular reliability attribute, the residual connectedness reliability, denoted by R(G). The residual connectedness reliability is the probability that all residual nodes are reachable from each other. It has been shown that R(G) is very useful but computing R(G) is #P-Complete, which is at least as hard as an NP-complete problem. Research in this area is focusing on heuristic approaches. In this study, we first propose a deterministic bounding approach to bound R(G). We obtain a deterministic upper bound and a lower bound. To prove that our bounds are tight, we demonstrate theoretically and numerically that the difference between the upper and the lower bounds gradually tends to zero as the number of nodes tends to infinity under the condition that the node failure probability is reasonably low, e.g., less than 0.1. In other words, for large distributed systems the upper and lower bounds give us an accurate estimation of R(G). Unfortunately, this approach doesn't work well for small and middle-sized systems with a high failure probability, e.g., greater than 0.1. In the second part of the paper, we present a new approach that combines a Monte Carlo simulation scheme and our deterministic bounding approach to obtain a probabilistic point estimator for R(G). We also determine the confidence interval of the estimator.
What problem does this paper attempt to address?