Abstract:In a large-scale cloud environment, many key performance indicators (KPIs) of entities are monitored in real time. These multivariate time series consist of high-dimensional, high-noise, random and time-dependent data. As a common method implemented in artificial intelligence for IT operations (AIOps), time series anomaly detection has been widely studied and applied. However, the existing detection methods cannot fully consider the influence of multiple factors and cannot quickly and accurately detect anomalies in multivariate KPIs of entities. Concurrently, fine-grained root cause locations cannot be determined for detected anomalies and often require abundant normal data that are difficult to obtain for model training. To solve these problems, we propose a long short-term memory (LSTM)-based semisupervised variational autoencoder (VAE) anomaly detection strategy called LR-SemiVAE. First, LR-SemiVAE uses VAE to perform feature dimension reduction and reconstruction of multivariate time series data and judges whether the entity is abnormal by calculating the reconstruction probability score. Second, by introducing an LSTM network into the VAE encoder and decoder, the model can fully learn the time dependence of multivariate time series. Then, LR-SemiVAE predicts the data labels by introducing a classifier to reduce the dependence on the original labeled data during model training. Finally, by proposing a new evidence lower bound (ELBO) loss function calculation method, LR-SemiVAE pays attention to the normal pattern and ignores the abnormal pattern during training to reduce the time cost of removing random anomaly and noise data. However, due to the limitations of LSTM in learning the long-term dependence of time series data, based on LR-SemiVAE, we propose a transformer-based semisupervised VAE anomaly detection and location strategy called RT-SemiVAE for cluster systems with complex service dependencies. This method learns the long-term dependence of multivariate time series by introducing a parallel multihead attention mechanism transformer, while LSTM is used to capture short-term dependence, and the introduction of parallel computing also markedly reduces model training time. After RT-SemiVAE detects entity anomalies, it traces the root entities according to the obtained service dependence graph and locates the root causes at the indicator level. We verify the strategies by using public data sets and constructing a system prototype. Experimental results show that compared with existing baseline methods, the LR-SemiVAE and RT-SemiVAE strategies can detect anomalies more quickly and accurately and perform fine-grained and accurate localization of the root causes of anomalies.

Semisupervised anomaly detection of multivariate time series based on a variational autoencoder

Robust and Unsupervised KPI Anomaly Detection Based on Highly Sensitive Conditional Variational Auto-Encoders.

Variance error of multi-classification based anomaly detection for time series data

LARA: A Light and Anti-overfitting Retraining Approach for Unsupervised Time Series Anomaly Detection

Self-adversarial variational autoencoder with spectral residual for time series anomaly detection

VELC: A New Variational AutoEncoder Based Model for Time Series Anomaly Detection

Self-Supervised Variational Graph Autoencoder for System-Level Anomaly Detection

Multivariate time series anomaly detection with variational autoencoder and spatial–temporal graph network

NVAE-GAN Based Approach for Unsupervised Time Series Anomaly Detection

Disentangled Anomaly Detection for Multivariate Time Series

Unsupervised Anomaly Detection on Microservice Traces through Graph VAE

Weakly Augmented Variational Autoencoder in Time Series Anomaly Detection

Attention Based CNN-LSTM Network for Anomaly Pattern Classification of Multivariate Time Series

Detection of Anomalies in Multivariate Time Series Using Ensemble Techniques

Online Data Drift Detection for Anomaly Detection Services based on Deep Learning towards Multivariate Time Series

Anomaly Detection for Time Series Using VAE-LSTM Hybrid Model

A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data

Variational transformer-based anomaly detection approach for multivariate time series

CNN and LSTM based Encoder-Decoder for Anomaly Detection in Multivariate Time Series

Unsupervised Anomaly Detection Using Variational Auto-Encoder based Feature Extraction

VESC: a new variational autoencoder based model for anomaly detection