Abstract:Outlier detection is of vital importance in data mining tasks, with numerous applications, including video surveillance and credit card fraud detection. Quite a few outlier detection algorithms have been developed and have received considerable attention, and most existing methods are classified as distance-based algorithms and density-based algorithms. However, both of these approaches have some flaws. The former has difficulty detecting local outliers, and the latter cannot handle low-density pattern problems. Moreover, outlier detection algorithms are sensitive to parameter settings. This paper proposes a simple and efficient outlier detection approach (called ADD) based on the average divergence difference of data objects; in this method there is no need to artificially define the number of neighbors of objects k to solve the above issues. In this algorithm, two new measures, called the divergence factor (DF) and the average divergence difference (LADD), are developed based on the skewed distribution characteristics of data objects and their natural neighbors, thus improving the accuracy of local outlier detection from an innovative research perspective. These factors are presented as external and internal characterization factors because the former characterizes the skew distribution characteristics and compactness relationship of data objects and the latter represents the difference in the skew distribution characteristics of data objects in a neighborhood. Then, we set an appropriate threshold to distinguish whether a data point is an outlier, which eliminates the interference of the Top-N problem. Finally, the final experimental results show that the ADD algorithm achieves an overall improvement in local outlier detection, especially in the detection of outliers in some datasets with complex distributions and in low-density areas, compared to that achieved by state-of-the-art algorithms.

A New Outlier Detection Algorithm Based on Fast Density Peak Clustering Outlier Factor

A neighborhood weighted-based method for the detection of outliers

Comparative Density Peaks Clustering

Outlier Detection Algorithm Based on Reachable Neighbor

Constraint-based Clustering by Fast Search and Find of Density Peaks

A Grid-Based Density Peaks Clustering Algorithm

SDROF: outlier detection algorithm based on relative skewness density ratio outlier factor

A Fast Density Peak Clustering Method for Power Data Security Detection Based on Local Outlier Factors

Fast Clustering Using Adaptive Density Peak Detection

A fast MST-inspired kNN-based outlier detection method

Detecting outliers by clustering algorithms

ADD: a new average divergence difference-based outlier detection method with skewed distribution of data objects

A Novel Density Peaks Clustering Algorithm Based on K Nearest Neighbors with Adaptive Merging Strategy

A New Density Peak Clustering Algorithm Based on Cluster Fusion Strategy

Outlier detection method based on high-density iteration

Outlier detection algorithm based on k-nearest neighbors-local outlier factor

Outlier Detection with Cluster Catch Digraphs

A Fast Outlier Detection Method for Big Data.

Info-Detection: An Information-Theoretic Approach To Detect Outlier

A method for outlier detection based on cluster analysis and visual expert criteria

A Novel Clustering Scheme based on Density Peaks and Spectral Analysis