Abstract:Outlier detection,also known as anomaly detection,is a very important foundamental research task in the field of data mining.It is mainly used for finding strange mechanism or potential danger,and aims to detecting those outliers their observations deviate so much from other observations and they are few suspicious data.Outliers,which are novel,unmoral and few,are often abandoned as noise or abnormal data.Outliers are also classified as many types,such as local,partial and so on.The techniques of outlier detection can be applied to many fields such as intrusion behavior,fraud,signs of early disease in the medical field and so on.Defining outliers by their distance to neighboring data points has been shown to be an effective non-parametric approach to outlier detection.The kNN-based algorithm could be used in big data sets efficiently,so it is widely applied for outliers detection based on distance and density.Unfortunately,the kNN-based algorithm's time complexity is O(N2),and it will be greatly increased with the size of date sets.The time complexity and space complexity of minimum spanning tree-based clustering algorithms using Prim's or Kruskal's method is O(N2),and the result of clustering depends on inputting parameters by users.Moreover,this algorithm can't detect outliers in high-density clusters.The existing MST-based algorithms become ineffective when provided with unsuitable parameters or applied to datasets which are composed of clusters with diverse shapes,sizes,and densities.Meanwhile,the most MST algorithms couldn't build tree dynamically,because of needing to know the distance between any two points in advance.In order to address these challenging problems,we proposed a new outliers detection method,which absorbs the advantages of distance-based method and density-based method.Firstly,this algorithm builds a split-tree to storage the information among data points.Secondly,we efficiently acquire all sets of well-separated pair decomposition on the whole dataset.Thirdly,all this algorithm partitions the input data set into several frames which are satisfy certain condition so that we can quickly obtain each point's k-nearest neighbors on the basis of the first two results.Fourthly,a minimum spinning tree is dynamically built according to the third result.In addition,we rank points which are suspected as outliers on the basis of its outlier factor by using the MST-based clustering without inputting parameter of cluster numbers manually.A new algorithm and a new metric are proposed to select the exact number of clusters and avoid insignificant clusters.And we detect all outliers at last.The time complexity of computing kNN and creating tree are O(kN) and O(NlogN),respectively.The experiments show that this new algorithm can detect both local outliers and global outliers without inputting the number of clusters from users.In the experiments,we use a series of real datasets and synthetic datasets to verify the efficiency and effectiveness of KDNS,FkNN and ADC proposed in this paper.The experimental results show that comparing with the previous approaches,our proposed algorithms can drastically reduce time complexity and significantly improve the rate of outlier detection.

A Log-Based Anomaly Detection Method with Efficient Neighbor Searching and Automatic K Neighbor Selection

Log-Based Anomaly Detection with the Improved K-Nearest Neighbor.

An Improved KNN-Based Efficient Log Anomaly Detection Method with Automatically Labeled Samples

An Integrated Method for Anomaly Detection From Massive System Logs.

Natural Language Processing-based Model for Log Anomaly Detection

A fast anomaly network traffic detection method based on the constrained k-nearest neighbor

Anomaly detection based on Nearest Neighbor search with Locality-Sensitive B-tree.

LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs

Applying Anomaly Pattern Score for Outlier Detection

Anomaly Detection with Representative Neighbors.

k-NNN: Nearest Neighbors of Neighbors for Anomaly Detection

Log2graphs: An Unsupervised Framework for Log Anomaly Detection with Efficient Feature Extraction

A Confidence-Guided Anomaly Detection Approach Jointly Using Multiple Machine Learning Algorithms

Anomaly Detection in Wireless Sensor Networks Based on KNN

A Fast kNN-Based MST Outlier Detection Method

SKDLog: Self-Knowledge Distillation-based CNN for Abnormal Log Detection.

Learning Minimum Volume Sets and Anomaly Detectors from KNN Graphs

Anomaly Detection for Remote Maintenance Control System of Natural Gas Pipeline Based on Nearest Neighbor Clustering

Ensemble Methods for Anomaly Detection Based on System Log.

An Anomaly Detection Framework Based On Autoencoder And Nearest Neighbor

LogKT: Hybrid Log Anomaly Detection Method for Cloud Data Center.