Abstract:Semi-supervised anomaly detection is based on the principle that potential anomalies are those records that look different from normal training data. However, in some cases we are specifically interested in anomalies that correspond to high attribute values (or low, but not both). We present two asymmetrical distance measures that take this directionality into account: ramp distance and signed distance. Through experiments on synthetic and real-life datasets we show that ramp distance performs as well or better than the absolute distance traditionally used in anomaly detection. While signed distance also performs well on synthetic data, it performs substantially poorer on real-life datasets. We argue that this reflects the fact that in practice, good scores on some attributes should not be allowed to compensate for bad scores on others.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in semi - supervised anomaly detection, how to use domain knowledge to handle attributes with directionality (that is, only relatively high attribute values or relatively low attribute values should be regarded as anomalies). Specifically, the author proposes and studies two asymmetric distance measurement methods - ramp distance and signed distance to better capture these directional features. ### Problem Background In traditional semi - supervised anomaly detection, the model is trained only on normal data and attempts to distinguish between normal data and abnormal data. However, in some application scenarios, we are only interested in anomalies in a specific direction. For example, in machine fault detection, we may only care about excessive workload, and in medical diagnosis, we may only focus on high - risk factors rather than low - risk factors or abnormally healthy patients. ### Proposed Solutions To meet this challenge, the author proposes two new distance measurement methods: 1. **Ramp Distance**: \[ d(y, x)=\sum_{j \leq m} d_j(y_j - x_j) \] where for each attribute \(j\), the distance measure \(d_j(y_j - x_j)\) is defined as: \[ d_j(y_j - x_j)=\max(0, y_j - x_j) \] This means that only when the attribute value of the test sample is higher than that of the training sample will it be regarded as an anomaly. 2. **Signed Distance**: \[ d_j(y_j - x_j)=y_j - x_j \] The signed distance directly uses the difference in attribute values and allows negative values to exist. This can be interpreted as low values providing negative evidence and high values providing positive evidence. ### Experimental Results Through experiments on synthetic data sets and real data sets, the author found that: - In the synthetic data set, the signed distance performs slightly better than the ramp distance in some cases. - In the real data set, the ramp distance performs significantly better than the signed distance, especially in the case of multiple risk factors, where unexpected low values should not compensate for other high values. ### Conclusions The author suggests that in practical applications, if it is known that some attributes are risk factors (that is, only high values are meaningful), the ramp distance should be used. In addition, if in a specific data set, the performance of the absolute distance is better than that of the ramp distance, it may be because the directional assumptions of some attributes do not hold, or there is no clear causal relationship linking low values to high risks. In summary, this paper provides a more flexible and effective anomaly detection method by introducing directional anomaly detection, especially in application scenarios involving risk factors.

Directional anomaly detection

Ordinal Regression for Direction-Related Anomaly Detection

Detecting Relative Anomaly

Anomaly detection using data depth: multivariate case

Toward Supervised Anomaly Detection

Anomaly Detection using Principles of Human Perception

Anomaly Detection with Partially Observed Anomalies

AD-MERCS: Modeling Normality and Abnormality in Unsupervised Anomaly Detection

A Latent Space Correlation-Aware Autoencoder for Anomaly Detection in Skewed Data

Enhancing Anomaly Detection via Generating Diversified and Hard-to-distinguish Synthetic Anomalies

Anomaly detection in surveillance video using motion direction statistics

Weighted subspace anomaly detection in high-dimensional space

SeMAnD: Self-Supervised Anomaly Detection in Multimodal Geospatial Datasets

Anomaly Detection Requires Better Representations

Factor Analysis of Mixed Data for Anomaly Detection

Drastic Anomaly Detection In Video Using Motion Direction Statistics

Unsupervised Anomaly Detection via Nonlinear Manifold Learning

A Non-Parametric Subspace Analysis Approach with Application to Anomaly Detection Ensembles

Anomaly Detection with Variance Stabilized Density Estimation

Unsupervised Learning Based Distributed Detection of Global Anomalies.

ESAD: End-to-end Semi-supervised Anomaly Detection