Abstract:The microservices architecture is extensively utilized in cloud-based application development, characterized by the construction of applications through a series of functionally independent, small, autonomous services. This architectural approach is renowned for its attributes such as high cohesion, availability, low coupling, and exceptional scalability. The detection of runtime system point anomalies in microservices architectures is crucial for enhancing the Quality of Service(QoS). Furthermore, identifying the classes of detected anomalies is critical in practical applications. However, given the highly dynamic nature of microservices systems as a distributed computing architecture, conducting real-time system anomaly detection on distributed independent microservices poses a challenging task. To address these challenges, we propose the System Anomaly Detection and Multi-Classification based on Multi-Task Feature Fusion Federated Learning (SADMC-MT-FF-FL) framework. Initially, we introduce a distributed learning framework based on Multi-task Federated Learning (MT-FL) to construct multi-classification anomaly detection models for each microservice. Secondly, to identify complex system anomaly patterns and features during the runtime of microservices, we develop a feature extractor based on External Attention Mechanism and Multi-channel Residual Structure (EA-MRS). Finally, we design a Local–Global Feature-based Parallel Knowledge Transfer (LGF-PKT) framework, utilizing parallel knowledge transfer to parallelize weight updates for local and global features. To validate the effectiveness of our approach, we conducted comprehensive comparative experiments on the microservices benchmark platforms Sock-Shop and Train-Ticket. The experimental results on anomaly detection for multiclassification systems demonstrate that SADMC-MT-FF-FL outperforms the best baseline method by 28.3% and 27.8% for Macro F1 and Micro F1 on Train-Ticket, and by 8.8% and 8.6% on Sock-Shop, respectively. Additionally, we conducted comparison experiments on three public datasets, SWaT, SMD, and SKAB. The F1 scores were 0.5% higher than those of the centralized methods on SMD, respectively, 6% and 2.8% higher than those of the federated learning based method on SWaT and SKAB. Source codes are available at: https://github.com/icc-lab-xhu1/SADMC-MT-FF-FL.

ART: A Unified Unsupervised Framework for Incident Management in Microservice Systems

An Unsupervised Framework for Anomaly Detection in a Water Treatment System

Multilayered Fault Detection and Localization With Transformer for Microservice Systems

A Diffusion-Based Framework for Multi-Class Anomaly Detection

Multi-task federated learning-based system anomaly detection and multi-classification for microservices architecture

AutoMAP: Diagnose Your Microservice-based Web Applications Automatically.

UTRAD: Anomaly detection and localization with U-Transformer

An Emergency Disposal Decision-making Method with Human--Machine Collaboration

Twin Graph-based Anomaly Detection via Attentive Multi-Modal Learning for Microservice System

Robust Multimodal Failure Detection for Microservice Systems

UniFormaly: Towards Task-Agnostic Unified Framework for Visual Anomaly Detection

A Unified Model for Multi-class Anomaly Detection

Ontology based autonomous robot task processing framework

Few-Shot Cross-System Anomaly Trace Classification for Microservice-based systems

Unsupervised Detection of Microservice Trace Anomalies Through Service-Level Deep Bayesian Networks

Practical Anomaly Detection over Multivariate Monitoring Metrics for Online Services

mABC: multi-Agent Blockchain-Inspired Collaboration for root cause analysis in micro-services architecture

Self-Adaptive Root Cause Diagnosis for Large-Scale Microservice Architecture

Learning Unified Reference Representation for Unsupervised Multi-class Anomaly Detection

An Intelligent Anomaly Detection Scheme for Micro-Services Architectures with Temporal and Spatial Data Analysis.

Unsupervised Spatio-Temporal State Estimation for Fine-grained Adaptive Anomaly Diagnosis of Industrial Cyber-physical Systems