Outlier detection in a multiset-valued information system based on rough set theory and granular computing

Yan Song,Hai Lin,Zhaowen Li
DOI: https://doi.org/10.1016/j.ins.2023.119950
IF: 8.1
2023-12-01
Information Sciences
Abstract:Outlier detection on data with missing information values is especially tricky because the uncertainty caused by missing information values may contribute to an object being an outlier. A multiset-valued information system (MSVIS) is an information system (IS) in which information values are multisets. This kind of IS is a useful way of handling datasets with missing information values. In this paper, we study outlier detection in an MSVIS based on rough set theory and granular computing. First, some concepts of multisets and probability distribution sets are reviewed, and the fact that a weak one-to-one correspondence exists between multisets and rational probability distribution sets is illustrated. In this way, multisets may be treated as rational probability distribution sets. Then, an MSVIS can be induced by an incomplete information system (IIS) and viewed as the result of information fusion of multiple categorical ISs. Next, a tolerance relation in an MSVIS is constructed with the induced rational probability distribution sets. Then, the outlier factor in an MSVIS is formulated, and the corresponding outlier detection algorithm is proposed. Finally, the performance evaluation by AUC (area under the curve) and F1-score shows the superiority of the proposed algorithm over some existing algorithms.
computer science, information systems
What problem does this paper attempt to address?