An Inference Algorithm for Probabilistic Fault Management in Distributed Systems

JG Ding,B Kramer,YC Bai,HS Chen
DOI: https://doi.org/10.1007/0-387-23198-6_15
2006-01-01
Abstract:With the proliferation of novel paradigms in distributed systems. including service-oriented computing, ubiquitous computing or self-organizing systems. an efficient distributed management system needs to work effectively even in face of incomplete management information. uncertain situations and dynamic changes. In this paper, Bayesian networks are proposed to model dependencies between managed objects in distributed systems management. Based on probabilistic backward inference mechanisms the so-called Strongest Dependency Route (SDR) algorithm is used to compute the set of most probable faults that may have caused an error or failure.
What problem does this paper attempt to address?