A Bayesian Network Approach to Modeling IT Service Availability using System Logs

Rui Zhang,Eric Cope,L. Heusler,Feng Cheng
2009-01-01
Abstract:The complexity of today’s IT systems makes capturing the behaviors of these environments increasingly difficult, and calls for automated modeling solutions. We present an approach to generating a probabilistic (Bayesian) network for modeling IT service availability on information reported in system logfiles, including service desk problem tickets, configuration management databases and system event monitoring logs. In particular, we harvest these data to derive a Bayesian network structure that captures failure causality between system components and and subsequently use the problem tickets to quantify the network parameters. Experiments based on a major European bank deployment have shown that our approach is able to generate models of reasonable accuracy even in the presence of limited amounts of data.
What problem does this paper attempt to address?