An entropy-based algorithm to detect relative outliers:ENBROD

Yu Shao-Yue,Shang Lin
DOI: https://doi.org/10.3321/j.issn:0469-5097.2008.02.014
2008-01-01
Abstract:In outlier detection many definitions of outlier take a global view of the dataset and these outliers can be viewed as "global" outliers.However,for many interesting real-world data sets which exhibit a more complex structure,there is another kind of outlier.This can be objects that are outlying relative to their local neighborhoods,particularly with respect to the densities of the neighborhoods.These outliers are regarded as "relative" outliers.An entropy-based algorithm is presented to detect relative outliers in data set with categorical attributes in this paper.After introducing a new information gain named leaveone partition information gain,this paper defines an outlier factor called Relative Outlier Factor(ROF) for each object.The outlier factor is relative in the sense that only a restricted neighborhood of each object is taken into account,then the ROFs of two classic discrete data sets are shown to demonstrate the validity of ROF.Furthermore,this paper provides the algorithm ENBROD(ENtropy-Based Relative Outlier Detector) to compute ROFs for each object and the time complexity of ENBROD is discussed in details.In the experimental part,the analysis of experiments on the zoo data set demonstrates the outliers detected by ENBROD are meaningful in practice.The results on the Winsconsin breast cancer data set demonstrate that the ability of ENBROD to find global outliers is similar with that of several other existing algorithms when the size of neighborhood is large enough.Furthermore,ENBROD is able to find other outliers other algorithms are blind to when the size of neighborhood is smaller.
What problem does this paper attempt to address?