RootMiner: A Rapid Root Cause Location Method for KPIs with Multi-Dimensional Attributes

Yaxing Li,Yuanqing Xia,Yufeng Zhan,Runze Gao,Chuge Wu
DOI: https://doi.org/10.1109/cac53003.2021.9727417
2021-01-01
Abstract:Additive key performance indicators (KPI) with multi-dimensional attributes are important monitoring indicators in internet companies. When the anomaly occurs in the overall KPI, it is critical but challenging to locate the root cause of the anomaly. There are mainly two important challenges in the task. Firstly, it is difficult to locate the root causes in two typical scenarios: the amount of anomalies is not obvious and two kinds of abnormal variations interact. Secondly, the number of KPI attribute value combinations is huge, which requires higher real-time performance of the algorithm. In this paper, a robust and rapid root cause location approach, RootMiner, is proposed to address the above challenges. Firstly, a new evaluation function is adopted to achieve good results in more complex scenarios. Secondly, a multi-tree data structure as well as pre-pruning strategy is applied to improve the calculation efficiency. Based on real data set from Alibaba Cloud Computing, the experiment results show that RootMiner achieves a great improvement in effectiveness, with an average improvement of 40% compared with the state-of-the-art. The results also show that RootMiner reduces the runtime from 10s to 1s on average.
What problem does this paper attempt to address?