Abstract:Anomaly or outlier detection is a major challenge in big data analytics because anomaly patterns provide valuable insights for decision-making in a wide range of applications. Recently proposed anomaly detection methods based on the tree isolation mechanism are very fast due to their logarithmic time complexity, making them capable of handling big data sets efficiently. However, the underlying similarity or distance measures in these methods have not been well understood. Contrary to the claims that these methods never rely on any distance measure, we find that they have close relationships with certain distance measures. This implies that the current use of this fast isolation mechanism is only limited to these distance measures and fails to generalise to other commonly-used measures. In this paper, we propose a generic framework named LSHiForest for fast tree isolation based ensemble anomaly analysis with the use of a Locality-Sensitive Hashing (LSH) forest. Being generic, the proposed framework can be instantiated with a diverse range of LSH families, and the fast isolation mechanism can be extended to any distance measures, data types and data spaces where an LSH family is defined. In particular, the instances of our framework with kernelised LSH families or learning based hashing schemes can detect complicated anomalies like local or surrounded anomalies. We also formally show that the existing tree isolation based detection methods are special cases of our framework with the corresponding distance measures. Extensive experiments on both synthetic and real-world benchmark data sets show that the framework can achieve both high time efficiency and anomaly detection quality.

Isolation-Based Anomaly Detection

Isolation-based Anomaly Detection Using Nearest-Neighbor Ensembles

OptIForest: Optimal Isolation Forest for Anomaly Detection

Isolation Forest Based Anomaly Detection Framework on Non-IID Data

Anomaly Detection in Network Management System Based on Isolation Forest

Efficient Anomaly Detection by Isolation Using Nearest Neighbour Ensemble

Deep Optimal Isolation Forest with Genetic Algorithm for Anomaly Detection

Data Anomaly Detection Based on Isolation Forest Algorithm

Anomaly Detection Based on Isolation Mechanisms: A Survey

LSHiForest: A Generic Framework for Fast Tree Isolation Based Ensemble Anomaly Analysis.

Title : Adaptive Anomaly Detection using Isolation Forest

Isolation Mondrian Forest for Batch and Online Anomaly Detection

Improved Anomaly Detection by Using the Attention-Based Isolation Forest

Functional Isolation Forest

Weighted Isolation and Random Cut Forest Algorithms for Anomaly Detection

Spectral-Spatial Anomaly Detection of Hyperspectral Data Based on Improved Isolation Forest

Incremental Isolation Forest to Handle Concept Drift in Anomaly Detection

An Innovative Application of Isolation-Based Nearest Neighbor Ensembles on Hyperspectral Anomaly Detection

On Detecting Clustered Anomalies Using Sciforest

Sparse random projection isolation forest for outlier detection

An improved X-means and isolation forest based methodology for network traffic anomaly detection