Markov Boundary-Based Outlier Mining

Kui Yu,Huanhuan Chen
DOI: https://doi.org/10.1109/tnnls.2018.2861743
IF: 14.255
2019-01-01
IEEE Transactions on Neural Networks and Learning Systems
Abstract:It is a grand challenge to identify the outliers existing in subspaces from a high-dimensional data set. A brute-force method is computationally prohibitive since it requires examining an exponential number of subspaces. Current state-of-the-art methods explore various heuristics to significantly prune subspaces, facing the tradeoff between the subspace completeness and search efficiency. In this brief, we discuss a principal type of subspace outliers whose behaviors are different from the others on individual attributes. We formulate such outliers by a novel notion of the Markov boundary-based (MBB) outliers. The central idea is that for each attribute $T$ in a data set, we consider only the subspace representing the knowledge needed to predict the behavior on $T$ , which is captured by the MB of $T$ . Then, the outliers whose behavior is different from others on $T$ can be detected in the subspace of the MB, and thus, our approach reduces the number of possible subspaces from exponential to linear with respect to dimensionality. Using both synthetic and real data sets, we validate the effectiveness and efficiency of our method.
What problem does this paper attempt to address?