Situation-Aware Multivariate Time Series Anomaly Detection Through Active Learning and Contrast VAE-Based Models in Large Distributed Systems

Zhihan Li,Youjian Zhao,Yitong Geng,Zhanxiang Zhao,Hanzhang Wang,Wenxiao Chen,Huai Jiang,Amber Vaidya,Liangfei Su,Dan Pei
DOI: https://doi.org/10.1109/jsac.2022.3191341
IF: 16.4
2022-01-01
IEEE Journal on Selected Areas in Communications
Abstract:The massive amounts of monitoring data in network applications bring an urgent need for intelligent operation in large distributed systems. The key problem is precisely detecting anomalies in multivariate time series (MTS) monitoring metrics with the awareness of different application scenarios. Unsupervised MTS anomaly detection methods aim at detecting data anomalies from historical MTS without considering the out-of-band information (including user feedback and background information like code deployment status), which leads to poor performance in practice. To take advantage of the out-of-band information, we propose ACVAE, an MTS anomaly detection algorithm through active learning and contrast VAE-based detection models, which simultaneously learns MTS data’s normal and anomalous patterns for anomaly detection. We also use a learnable prior to capture system status from the background information. Moreover, we propose a query model for VAE-based methods, which can learn to query labels of the most useful instances to train the detection model. We evaluate our algorithm on three different monitoring situations in eBay’s search back-end systems. ACVAE achieves a range F1 score of 0.68~0.96 with only 3% labels, significantly outperforming the best competing methods by 0.18~0.50, and even better than a supervised ensemble method designed by domain experts in eBay.
What problem does this paper attempt to address?