A minimum spanning tree-inspired clustering-based outlier detection technique

Xiaochun Wang,Xia Li Wang,D. Mitch Wilkes
DOI: https://doi.org/10.1007/978-3-642-31488-9_17
2012-01-01
Abstract:Due to its important applications in data mining, many techniques have been developed for outlier detection. In this paper, an efficient three-phase outlier detection technique. First, we modify the famous k-means algorithm for an efficient construction of a spanning tree which is very close to a minimum spanning tree of the data set. Second, the longest edges in the obtained spanning tree are removed to form clusters. Based on the intuition that the data points in small clusters may be most likely all outliers, they are selected and regarded as outlier candidates. Finally, density-based outlying factors, LOF, are calculated for potential outlier candidates and accessed to pinpoint the local outliers. Extensive experiments on real and synthetic data sets show that the proposed approach can efficiently identify global as well as local outliers for large-scale datasets with respect to the state-of-the-art methods.
What problem does this paper attempt to address?