Abstract:Outlier detection amounts to finding data points that differ significantly from the norm. Classic outlier detection methods are largely designed for single data type such as continuous or discrete. However, real world data is increasingly heterogeneous, where a data point can have both discrete and continuous attributes. Handling mixed-type data in a disciplined way remains a great challenge. In this paper, we propose a new unsupervised outlier detection method for mixed-type data based on Mixed-variate Restricted Boltzmann Machine (<a class="link-external link-http" href="http://Mv.RBM" rel="external noopener nofollow">this http URL</a>). The <a class="link-external link-http" href="http://Mv.RBM" rel="external noopener nofollow">this http URL</a> is a principled probabilistic method that models data density. We propose to use \emph{free-energy} derived from <a class="link-external link-http" href="http://Mv.RBM" rel="external noopener nofollow">this http URL</a> as outlier score to detect outliers as those data points lying in low density regions. The method is fast to learn and compute, is scalable to massive datasets. At the same time, the outlier score is identical to data negative log-density up-to an additive constant. We evaluate the proposed method on synthetic and real-world datasets and demonstrate that (a) a proper handling mixed-types is necessary in outlier detection, and (b) free-energy of <a class="link-external link-http" href="http://Mv.RBM" rel="external noopener nofollow">this http URL</a> is a powerful and efficient outlier scoring method, which is highly competitive against state-of-the-arts.

Outlier Detection on Mixed-Type Data: An Energy-based Approach

Projected outlier detection in high-dimensional mixed-attributes data set

MIX: A Joint Learning Framework for Detecting Both Clustered and Scattered Outliers in Mixed-Type Data

Outlier Detection Using Local Density and Global Structure

Efficient Outlier Detection for High-Dimensional Data

Multigranulation Relative Entropy-Based Mixed Attribute Outlier Detection in Neighborhood Systems

A Novel Density-Based Outlier Detection Approach for Low Density Datasets

A Parametric and Non-Parametric Approach for High-Accurate Outlier Detection.

A Hybrid Distance-Based Outlier Detection Approach

Robust Outlier Detection Method Based on Local Entropy and Global Density

Data-driven cluster analysis method: a novel outliers detection method in multivariate data

A Spectral Clustering Based Outlier Detection Technique.

An Outlier Detection Technique Based on Spectral Clustering

Finding Centric Local Outliers in Categorical/numerical Spaces.

An Outlier Detection Algorithm based on Local Density and Natural Neighbors

Distributed Outlier Detection in Hierarchically Structured Datasets with Mixed Attributes

A Novel Outlier Detection Method for Multivariate Data

Outlier Detection by Energy Minimization in Quantized Residual Preference Space for Geometric Model Fitting

Unsupervised Outlier Detection using Random Subspace and Subsampling Ensembles of Dirichlet Process Mixtures

ECOD: Unsupervised Outlier Detection Using Empirical Cumulative Distribution Functions

Novel Clustering-Based Approach for Local Outlier Detection