PUTraceAD: Trace Anomaly Detection with Partial Labels based on GNN and PU Learning

Ke Zhang,Chenxi Zhang,Xin Peng,Chaofeng Sha
DOI: https://doi.org/10.1109/ISSRE55969.2022.00032
2022-01-01
Abstract:Distributed tracing has been an important part of microservice infrastructure and learning-based trace analysis has been used to detect anomalies in microservice systems. Existing learning-based trace anomaly detection approaches ei-ther assume that trace patterns can be learned from normal execution or rely on fault injection to produce labeled traces (i.e., normal/anomalous ones). However, in practice it is often difficult to ensure that the normal execution does not involve anomalous traces or obtain a large variety of normal and anomalous traces through fault injection. In this paper, we propose PUTraceAD, a trace anomaly detection approach that can alleviate the above problems. PUTraceAD represents a trace as a span causal graph with node features such as operation name, response code, duration time. Based on the graph representation, PUTraceAD trains a GNN- and PU learning-based trace anomaly detection model. During the process, PU (Positive and Unlabeled) learning optimizes model parameters through estimating the data distribution. Therefore, PUTraceAD can train the model based on a small set of labeled anomalous traces and a large set of unlabeled traces. Our evaluation shows that PUTraceAD outperforms existing unsupervised trace anomaly detection approaches and only slightly underperforms a supervised learning-based approach that takes full advantage of labeled traces.
What problem does this paper attempt to address?