A Flexible Architecture for Statistical Learning and Data Mining from System Log Streams

Wei Xu,Peter Bod ´ ik,David Patterson
2004-01-01
Abstract:Modern computer systems are instrumented to generate huge amounts of system log data. This data contains valu- able information for managing the system, localizing fail- ures, and recovery. However, the complexity of these sys- tems greatly surpasses what can be understood by human operators and thus automated analysis systems are begin- ning to be used. Due to preprocessing required by the statis- tical algorithms, the extremely high volume of data cannot be processed using ad-hoc scripts. We present a flexible, modular and scalable architecture for statistical learning from large data streams that can easily process lots of data. We built a prototype that is evaluated using system log data from a commercial on-line service. Moreover, the results of the analysis were genuinely useful for the on-line service operators.
What problem does this paper attempt to address?