Combine Value Clustering and Weighted Value Coupling Learning for Outlier Detection in Categorical Data

Hongzuo Xu,Yongjun Wang,Zhongke Wu,Xingkong Ma,Zhiquan Qin
DOI: https://doi.org/10.1007/978-3-319-98812-2_40
2018-01-01
Abstract:This paper introduces a novel unsupervised outlier detection method, namely WOD, for identifying outliers in categorical data. Existing subspace-based methods are challenged by overwhelming irrelevant features and their performance sometimes heavily depends on the setting of subspace size. Feature selection-based methods may omit the relevant information when removing features. In contrast, WOD works on value-level subspace exploring, i.e., separating irrelevant values based on value cluster structure, which avoids the dilemma of setting subspace size. Value outlierness is estimated by modeling weighted value couplings between relevant value set and value full set to further eliminate the interference from noisy features. We show that (i) WOD significantly outperforms five state-of-the-art outlier detectors on 12 real-world data sets with different levels of noisy features; (ii) WOD obtains good scalability.
What problem does this paper attempt to address?