MS-Rank: Multi-Metric and Self-Adaptive Root Cause Diagnosis for Microservice Applications

Meng Ma,Weilan Lin,Disheng Pan,Ping Wang
DOI: https://doi.org/10.1109/icws.2019.00022
2019-01-01
Abstract:This paper presents a self-adaptive root cause diagnosis framework, named MS-Rank, to analyze multiple metrics collected from micro-service architecture. MS-Rank decomposes the task into four phases: impact graph construction, random walk diagnosis, result precision calculation and metrics weight update. First, we introduce a series of basic and implied metrics into MS-Rank, and design an impact graph construction algorithm to discover causal relationship between services during anomalies. Second, we propose a random walk algorithm with forward, selfward and backward transitions to heuristically identify the root cause service. Third, we establish a self-optimizing mechanism to dynamically update the confidence weight of different metrics according to their diagnosis precision. We develop a prototype system and integrate MS-Rank into IBM Cloud, to validate and compare it with selected benchmarks. Experimental results show that MS-Rank offers fast identification and precise diagnosis result. In multiple rounds of diagnosis, MS-Rank optimizes itself effectively.
What problem does this paper attempt to address?