Abstract:Outlier detection is of vital importance in data mining tasks, with numerous applications, including video surveillance and credit card fraud detection. Quite a few outlier detection algorithms have been developed and have received considerable attention, and most existing methods are classified as distance-based algorithms and density-based algorithms. However, both of these approaches have some flaws. The former has difficulty detecting local outliers, and the latter cannot handle low-density pattern problems. Moreover, outlier detection algorithms are sensitive to parameter settings. This paper proposes a simple and efficient outlier detection approach (called ADD) based on the average divergence difference of data objects; in this method there is no need to artificially define the number of neighbors of objects k to solve the above issues. In this algorithm, two new measures, called the divergence factor (DF) and the average divergence difference (LADD), are developed based on the skewed distribution characteristics of data objects and their natural neighbors, thus improving the accuracy of local outlier detection from an innovative research perspective. These factors are presented as external and internal characterization factors because the former characterizes the skew distribution characteristics and compactness relationship of data objects and the latter represents the difference in the skew distribution characteristics of data objects in a neighborhood. Then, we set an appropriate threshold to distinguish whether a data point is an outlier, which eliminates the interference of the Top-N problem. Finally, the final experimental results show that the ADD algorithm achieves an overall improvement in local outlier detection, especially in the detection of outliers in some datasets with complex distributions and in low-density areas, compared to that achieved by state-of-the-art algorithms.

K -means Algorithm Based on Outliers Detection

A local search algorithm for k-means with outliers

Detecting outliers by clustering algorithms

Outlier Detection using Improved Genetic K-means

A Novel Effective Distance Measure and a Relevant Algorithm for Optimizing the Initial Cluster Centroids of K-means

An Improved K-Means Clustering Algorithm Based on Spectral Method

MSD-Kmeans: A Novel Algorithm for Efficient Detection of Global and Local Outliers

On Saving Outliers for Better Clustering over Noisy Data.

Clustering With Outlier Removal

MSD-Kmeans: A Hybrid Algorithm for Efficient Detection of Global and Local Outliers

A New Outlier Detection Algorithm Based on Fast Density Peak Clustering Outlier Factor

A Novel K '-Means Algorithm For Clustering Analysis

A fast MST-inspired kNN-based outlier detection method

Subspace Clustering by Directly Solving Discriminative K-means

Outlier detection algorithm based on k-nearest neighbors-local outlier factor

Evolution of $K$-means solution landscapes with the addition of dataset outliers and a robust clustering comparison measure for their analysis

Outlier Detection with Cluster Catch Digraphs

Outliers Detection Is Not So Hard: Approximation Algorithms for Robust Clustering Problems Using Local Search Techniques

K*-Means: An Efficient Clustering Algorithm with Adaptive Decision Boundaries

ADD: a new average divergence difference-based outlier detection method with skewed distribution of data objects

r-Reference points based k-means algorithm