Semisupervised anomaly detection of multivariate time series based on a variational autoencoder

Ningjiang Chen,Huan Tu,Xiaoyan Duan,Liangqing Hu,Chengxiang Guo
DOI: https://doi.org/10.1007/s10489-022-03829-1
IF: 5.3
2022-07-05
Applied Intelligence
Abstract:In a large-scale cloud environment, many key performance indicators (KPIs) of entities are monitored in real time. These multivariate time series consist of high-dimensional, high-noise, random and time-dependent data. As a common method implemented in artificial intelligence for IT operations (AIOps), time series anomaly detection has been widely studied and applied. However, the existing detection methods cannot fully consider the influence of multiple factors and cannot quickly and accurately detect anomalies in multivariate KPIs of entities. Concurrently, fine-grained root cause locations cannot be determined for detected anomalies and often require abundant normal data that are difficult to obtain for model training. To solve these problems, we propose a long short-term memory (LSTM)-based semisupervised variational autoencoder (VAE) anomaly detection strategy called LR-SemiVAE. First, LR-SemiVAE uses VAE to perform feature dimension reduction and reconstruction of multivariate time series data and judges whether the entity is abnormal by calculating the reconstruction probability score. Second, by introducing an LSTM network into the VAE encoder and decoder, the model can fully learn the time dependence of multivariate time series. Then, LR-SemiVAE predicts the data labels by introducing a classifier to reduce the dependence on the original labeled data during model training. Finally, by proposing a new evidence lower bound (ELBO) loss function calculation method, LR-SemiVAE pays attention to the normal pattern and ignores the abnormal pattern during training to reduce the time cost of removing random anomaly and noise data. However, due to the limitations of LSTM in learning the long-term dependence of time series data, based on LR-SemiVAE, we propose a transformer-based semisupervised VAE anomaly detection and location strategy called RT-SemiVAE for cluster systems with complex service dependencies. This method learns the long-term dependence of multivariate time series by introducing a parallel multihead attention mechanism transformer, while LSTM is used to capture short-term dependence, and the introduction of parallel computing also markedly reduces model training time. After RT-SemiVAE detects entity anomalies, it traces the root entities according to the obtained service dependence graph and locates the root causes at the indicator level. We verify the strategies by using public data sets and constructing a system prototype. Experimental results show that compared with existing baseline methods, the LR-SemiVAE and RT-SemiVAE strategies can detect anomalies more quickly and accurately and perform fine-grained and accurate localization of the root causes of anomalies.
computer science, artificial intelligence
What problem does this paper attempt to address?