Outlier detection using conditional information entropy and rough set theory
Zhaowen Li,Shengxue Wei,Suping Liu
DOI: https://doi.org/10.3233/jifs-236009
2023-11-25
Journal of Intelligent & Fuzzy Systems
Abstract:Outlier detection is critically important in the field of data mining. Real-world data have the impreciseness and ambiguity which can be handled by means of rough set theory. Information entropy is an effective way to measure the uncertainty in an information system. Most outlier detection methods may be called unsupervised outlier detection because they are only dealt with unlabeled data. When sufficient labeled data are available, these methods are used in a decision information system, which means that the decision attribute is discarded. Thus, these methods maybe not right for outlier detection in a a decision information system. This paper proposes supervised outlier detection using conditional information entropy and rough set theory. Firstly, conditional information entropy in a decision information system based on rough set theory is calculated, which provides a more comprehensive measure of uncertainty. Then, the relative entropy and relative cardinality are put forward. Next, the degree of outlierness and weight function are presented to find outlier factors. Finally, a conditional information entropy-based outlier detection algorithm is given. The performance of the given algorithm is evaluated and compared with the existing outlier detection algorithms such as LOF, KNN, Forest, SVM, IE, and ECOD. Twelve data sets have been taken from UCI to prove its efficiency and performance. For example, the AUC value of CIE algorithm in the Hayes data set is 0.949, and the AUC values of LOF, KNN, SVM, Forest, IE and ECOD algorithms in the Hayes data set are 0.647, 0.572, 0.680, 0.676, 0.928 and 0.667, respectively. The advantage of the proposed outlier detection method is that it fully utilizes the decision information.