Generic and Robust Localization of Multi-dimensional Root Causes.

Zeyan Li,Chengyang Luo,Yiwei Zhao,Yongqian Sun,Kaixin Sui,Xiping Wang,Dapeng Liu,Xing Jin,Qi Wang,Dan Pei
DOI: https://doi.org/10.1109/issre.2019.00015
2019-01-01
Abstract:Operators of online software services periodically collect various measures with many attributes. When a measure becomes abnormal, indicating service problems such as reliability degrade, operators would like to rapidly and accurately localize the root cause attribute combinations within a huge multi-dimensional search space. Unfortunately, previous approaches are not generic or robust in that they all suffer from impractical root cause assumptions, handling only directly collected measures but not derived ones, handling only anomalies with signicant magnitudes but not those insignicant but important ones, requiring manual parameter ne-tuning, or being too slow. This paper proposes a generic and robust multi-dimensional root cause localization approach, Squeeze, that overcomes all above limitations, the first in the literature. Through our novel bottom-up then top-down searching strategy and the techniques based on our proposed generalized ripple effect and generalized potential score, Squeeze is able to reach a good trade off between search speed and accuracy in a generic and robust manner. Case studies in several banks and an Internet company show that Squeeze can localize root causes much more rapidly and accurately than the traditional manual analysis. Furthermore, our extensive experiments on semi-synthetic datasets show that the F1-score of Squeeze outperforms previous approaches by 0.4 on average, while its localization time is only about 10 seconds.
What problem does this paper attempt to address?