TraceGra: A Trace-Based Anomaly Detection for Microservice Using Graph Deep Learning

Jian Chen,Fagui Liu,Jun Jiang,Guoxiang Zhong,Dishi Xu,Zhuanglun Tan,Shangsong Shi
DOI: https://doi.org/10.1016/j.comcom.2023.03.028
2022-01-01
SSRN Electronic Journal
Abstract:Trace is widely used to detect anomalies in distributed microservice systems because of the capability of precisely reconstructing user request paths. However, most existing trace-based anomaly detection approaches treat the trace as a sequence of microservice invocations with response time information, which ignores the graph structure of trace and abnormal resource consumption of the complex distributed deployment environment of microservice. In this paper, we propose TraceGra, an unsupervised encoder–decoder anomaly detection approach. TraceGra first provides a unified graph representation to combine traces and performance metrics of the container. Then, it introduces the graph neural network (GNN) and long short-term memory network (LSTM) to extract the topology and temporal features, respectively. Finally, it adds the two-part loss value with two hyperparameters as the anomaly score. The evaluation results on an open-source dataset and a local dataset collected from an ARM server cluster show that TraceGra achieves a high precision (0.97) and recall (0.93), outperforming some state-of-the-art approaches with an average increase of 0.1 in F1-score.
What problem does this paper attempt to address?