Outlier detection by sampling with accuracy guarantees

Mingxi Wu,Christopher Jermaine,Naoki Abe,Bianca Zadrozny,John Langford
DOI: https://doi.org/10.1145/1150402.1150501
2006-01-01
Abstract:An effective approach to detecting anomalous points in a data set is distance-based outlier detection. This paper describes a simple sampling algorithm to effciently detect distance-based outliers in domains where each and every distance computation is very expensive. Unlike any existing algorithms, the sampling algorithm requires a xed number of distance computations and can return good results with accuracy guarantees. The most computationally expensive aspect of estimating the accuracy of the result is sorting all of the distances computed by the sampling algorithm. The experimental study on two expensive domains as well as ten additional real-life datasets demonstrates both the effciency and effectiveness of the sampling algorithm in comparison with the state-of-the-art algorithm and there liability of the accuracy guarantees.
What problem does this paper attempt to address?