MWFP-outlier: Maximal weighted frequent-pattern-based approach for detecting outliers from uncertain weighted data streams
Saihua Cai,Li Li,Jinfu Chen,Kaiyi Zhao,Gang Yuan,Ruizhi Sun,Rexford Nii Ayitey Sosu,Longxia Huang
DOI: https://doi.org/10.1016/j.ins.2022.01.028
IF: 8.1
2022-04-01
Information Sciences
Abstract:Many outlier detection approaches have been proposed for identifying previously unknown outliers, therefore improving the credibility of data. However, previous outlier detection approaches have some problems. First, most approaches were designed for static precise datasets, thus, their detection accuracy is very low when processing uncertain data streams. Second, these approaches considered the importance (aka weight) of each pattern is the same, which could not accurately reflect some actual situations in real life. To solve these problems, we propose an efficient maximal weighted frequent-pattern-based outlier detection approach, called MWFP-Outlier, for accurately detecting potential outliers from uncertain data streams through two phases, namely pattern mining phase and an outlier detection phase. In the pattern mining phase, through fully considering the existential probabilities and weights for each pattern, we propose the MWFP-Mine approach to accurately and efficiently mine maximal weighted frequent patterns based on the designed tree structure, list structure, and pruning strategies. In the outlier detection phase, we design four deviation indices to accurately measure the deviation degree of each transaction, and then the transactions in the top k ranked are identified as potential outliers. Extensive experimental results demonstrate that the MWFP-Outlier approach can accurately detect the outliers from uncertain weighted data streams, as well as uses less time consumption.
computer science, information systems