Root Cause Analysis of Anomalies of Multitier Services in Public Clouds.

Jianping Weng,Jessie Hui Wang,Jiahai Yang,Yang
DOI: https://doi.org/10.1109/tnet.2018.2843805
2018-01-01
IEEE/ACM Transactions on Networking
Abstract:Anomalies of multitier services running in cloud platform can be caused by components of the same tenant or performance interference from other tenants. If the performance of a multitier service degrades, we need to find out the root causes precisely to recover the service as soon as possible. In this paper, we argue that cloud providers are in a better position than tenants to solve this problem, and the solution should be non-intrusive to tenants' services or applications. Based on these two considerations, we propose a solution for cloud providers to help tenants to localize root causes of any anomaly. We design a non-intrusive method to capture the dependency relationships of components, which improves the feasibility of root cause localization system. Our solution can find out root causes no matter they are in the same tenant as the anomaly or from other tenants. Our proposed two-step localization algorithm exploits measurement data of both application layer and underlay infrastructure and a random walk procedure to improve its accuracy. Our real-world experiments of a three-tier web application running in a small-scale cloud platform show a 38.9% improvement in mean average precision compared to current methods.
What problem does this paper attempt to address?