Outlier detection for incomplete real-valued data via information entropy and class-consistent technology

Xiaopeng Cai,Zhaowen Li
DOI: https://doi.org/10.1007/s10489-024-05428-8
IF: 5.3
2024-04-20
Applied Intelligence
Abstract:Outlier detection aims to find data points that are significantly different from other observed values. It has been widely used in fraud detection, network security, and medical fields. Most of the existing outlier detection methods do not fully consider the problem of missing values in data sets. This paper studies outlier detection for incomplete real-valued data via information entropy and rough set theory (RST). First, a tolerance relation based on class-consistent technology is introduced to describe the similarity between information values in an incomplete real-valued information system (IRVIS). Then, the tolerance classes are formed according to the tolerance relation, and are used to calculate information entropy and other metrics. Next, an outlier factor is defined for each object in an IRVIS to describe its uncertainty and degree of outlier. Finally, an outlier detection method for an IRVIS is proposed, and the corresponding algorithm (CIEOD) is designed. The proposed method is compared with five other detection methods by numerical experiments based on UCI data. The experimental results show that the CIEOD algorithm is more efficient. It is worth mentioning that in order to make comprehensive comparison, Precision, Recall, F1-measure and ROC curve are used to describe the strengths of the proposed method.
computer science, artificial intelligence
What problem does this paper attempt to address?