Multilayered Fault Detection and Localization With Transformer for Microservice Systems

Jingyu Wang,Yuewei Li,Qi Qi,Yan Lu,Bo Wu
DOI: https://doi.org/10.1109/tr.2024.3356717
IF: 5.883
2024-01-01
IEEE Transactions on Reliability
Abstract:Software architecture is undergoing a transition from monolithic architecture to microservices to achieve resilience, agility, and scalability in the software life cycle. The complex dependability of these microservices may lead to unexpected failures, which becomes a major concern on reliability for application providers. The existing fault detection and localization algorithms for microservice systems only focus on the relationship within microservices and cannot achieve finer granularity from a layered system perspective, including microservices, containers, physical machines, and networks. To tackle this problem, we propose a multilayered method that deconstructs cloud-based microservices and connects the information from various layers to enhance the precision of fault detection and localization. The proposed Transformer encoder model can detect anomalies of containers in the resource layer, and by decomposing and analyzing invocation latency, anomalies in the service layer can be detected. After determining the faulty area of the resource layer based on the above anomalies, a multifactor root cause score is used to sort root cause metrics in the faulty area for localization. Evaluations were performed on three datasets: the Sock-Shop dataset we collected from an actual microservice system, the AIOps2020 preliminary dataset, and the SMD. Empirical investigations conducted on these datasets show that our models enhance the F1 score by approximately 0.25 for anomaly detection and improve the mean average precision by up to 0.54 for root cause localization, which underscores the utility of our models in effectively managing microservice systems in practical scenarios.
engineering, electrical & electronic,computer science, software engineering, hardware & architecture
What problem does this paper attempt to address?