Abstract:Anomaly detection to recognize unusual events in large scale systems in a time sensitive manner is critical in many industries, eg. bank fraud, enterprise systems, medical alerts, etc. Large-scale systems often grow in size and complexity over time, and anomaly detection algorithms need to adapt to changing structures. A hierarchical approach takes advantage of the implicit relationships in complex systems and localized context. The features in complex systems may vary drastically in data distribution, capturing different aspects from multiple data sources, and when put together provide a more complete view of the system. In this paper, two datasets are considered, the 1st comprising of system metrics from machines running on a cloud service, and the 2nd of application metrics from a large-scale distributed software system with inherent hierarchies and interconnections amongst its system nodes. Comparing algorithms, across the changepoint based PELT algorithm, cognitive learning-based Hierarchical Temporal Memory algorithms, Support Vector Machines and Conditional Random Fields provides a basis for proposing a Hierarchical Global-Local Conditional Random Field approach to accurately capture anomalies in complex systems across various features. Hierarchical algorithms can learn both the intricacies of specific features, and utilize these in a global abstracted representation to detect anomalous patterns robustly across multi-source feature data and distributed systems. A graphical network analysis on complex systems can further fine-tune datasets to mine relationships based on available features, which can benefit hierarchical models. Furthermore, hierarchical solutions can adapt well to changes at a localized level, learning on new data and changing environments when parts of a system are over-hauled, and translate these learnings to a global view of the system over time.

RARE: a labeled dataset for cloud-native memory anomalies

Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset

Anomaly detection in the CERN cloud infrastructure

Learning Multi-Pattern Normalities in the Frequency Domain for Efficient Time Series Anomaly Detection

A Spatiotemporal Deep Learning Approach for Unsupervised Anomaly Detection in Cloud Systems

Evaluating Real-time Anomaly Detection Algorithms - the Numenta Anomaly Benchmark

Synthetic Time Series for Anomaly Detection in Cloud Microservices

Cross-dataset Time Series Anomaly Detection for Cloud Systems

DeCorus: Hierarchical Multivariate Anomaly Detection at Cloud-Scale

MoniLog: An Automated Log-Based Anomaly Detection System for Cloud Computing Infrastructures

Benchmarking Anomaly Detection Methods: Insights From the UCR Time Series Anomaly Archive

Low-count Time Series Anomaly Detection

iAnomaly: A Toolkit for Generating Performance Anomaly Datasets in Edge-Cloud Integrated Computing Environments

Real3D-AD: A Dataset of Point Cloud Anomaly Detection

Anomaly detection in cloud environment using artificial intelligence techniques

Sliced-Wasserstein-based Anomaly Detection and Open Dataset for Localized Critical Peak Rebates

A Hierarchical Approach to Conditional Random Fields for System Anomaly Detection

Anomaly Detection at Scale: The Case for Deep Distributional Time Series Models

Unsupervised real-time anomaly detection for streaming data

An Anomaly-based Detection System for Monitoring Kubernetes Infrastructures

Signature-based Adaptive Cloud Resource Usage Prediction Using Machine Learning and Anomaly Detection