Abstract:Outlier detection,also known as anomaly detection,is a very important foundamental research task in the field of data mining.It is mainly used for finding strange mechanism or potential danger,and aims to detecting those outliers their observations deviate so much from other observations and they are few suspicious data.Outliers,which are novel,unmoral and few,are often abandoned as noise or abnormal data.Outliers are also classified as many types,such as local,partial and so on.The techniques of outlier detection can be applied to many fields such as intrusion behavior,fraud,signs of early disease in the medical field and so on.Defining outliers by their distance to neighboring data points has been shown to be an effective non-parametric approach to outlier detection.The kNN-based algorithm could be used in big data sets efficiently,so it is widely applied for outliers detection based on distance and density.Unfortunately,the kNN-based algorithm's time complexity is O(N2),and it will be greatly increased with the size of date sets.The time complexity and space complexity of minimum spanning tree-based clustering algorithms using Prim's or Kruskal's method is O(N2),and the result of clustering depends on inputting parameters by users.Moreover,this algorithm can't detect outliers in high-density clusters.The existing MST-based algorithms become ineffective when provided with unsuitable parameters or applied to datasets which are composed of clusters with diverse shapes,sizes,and densities.Meanwhile,the most MST algorithms couldn't build tree dynamically,because of needing to know the distance between any two points in advance.In order to address these challenging problems,we proposed a new outliers detection method,which absorbs the advantages of distance-based method and density-based method.Firstly,this algorithm builds a split-tree to storage the information among data points.Secondly,we efficiently acquire all sets of well-separated pair decomposition on the whole dataset.Thirdly,all this algorithm partitions the input data set into several frames which are satisfy certain condition so that we can quickly obtain each point's k-nearest neighbors on the basis of the first two results.Fourthly,a minimum spinning tree is dynamically built according to the third result.In addition,we rank points which are suspected as outliers on the basis of its outlier factor by using the MST-based clustering without inputting parameter of cluster numbers manually.A new algorithm and a new metric are proposed to select the exact number of clusters and avoid insignificant clusters.And we detect all outliers at last.The time complexity of computing kNN and creating tree are O(kN) and O(NlogN),respectively.The experiments show that this new algorithm can detect both local outliers and global outliers without inputting the number of clusters from users.In the experiments,we use a series of real datasets and synthetic datasets to verify the efficiency and effectiveness of KDNS,FkNN and ADC proposed in this paper.The experimental results show that comparing with the previous approaches,our proposed algorithms can drastically reduce time complexity and significantly improve the rate of outlier detection.

An explainable outlier detection method using region-partition trees

LP-Explain: Local Pictorial Explanation for Outliers.

Decision Tree Regression with Residual Outlier Detection

Robust Multi-Kernel Nearest Neighborhood for Outlier Detection

Sparse random projection isolation forest for outlier detection

Outlier Detection Using Diverse Neighborhood Graphs

A minimum spanning tree-inspired clustering-based outlier detection technique

Outlier Detection via Minimum Spanning Tree.

A method for outlier detection based on cluster analysis and visual expert criteria

Ordinal Outlier Detection Based On Recursive Uniform Partitioning

A New Outlier Detection Model Using Random Walk On Local Information Graph

A Fast kNN-Based MST Outlier Detection Method

Comparative Study of Neighbor-based Methods for Local Outlier Detection

A Probabilistic Transformation of Distance-Based Outliers

Privacy-Preserving Outlier Detection with High Efficiency over Distributed Datasets

SDROF: outlier detection algorithm based on relative skewness density ratio outlier factor

A fast MST-inspired kNN-based outlier detection method

An Optimized Computational Framework for Isolation Forest

A neighborhood weighted-based method for the detection of outliers

Outlier detection using conditional information entropy and rough set theory

Applying Anomaly Pattern Score for Outlier Detection