Fault Localization for Microservice Applications with System Logs and Monitoring Metrics

Qixun Zhang,Tong Jia,Zhonghai Wu,Qingxin Wu,Lichun Jia,Donglei Li,Yuqing Tao,Yutong Xiao
DOI: https://doi.org/10.1109/icccbda55098.2022.9778893
2022-01-01
Abstract:Microservices have been widely used in enterprises due to their excellent scalability and timely update capabilities. However, while the fine-grained modularity and service orientation decrease the complexity of system development, the complexity of system operation and maintenance has been greatly increased because system faults are becoming very frequent and complex. Therefore, fault localization, that is, diagnosing fault service and its root cause, is very important yet challenging for microservice applications. One of the most challenges of fault localization is fusing multiple data sources because system faults will exhibit different features in multiple data sources. Therefore, it is necessary to fuse multiple data sources and build a unified model for fault localization. In this paper, we propose a fault localization approach with the fusion of system logs and monitoring metrics. Our approach first discovers service dependencies and then utilizes system logs and monitoring metrics to detect anomalies for each microservice. Finally, it locates the fault service and recommends the root cause system metrics based on the service dependencies and detected anomalies. The experiment results show that the average precision of our approach is ∼75%.
What problem does this paper attempt to address?