Multi-Hierarchy Attribute Relationship Mining Based Outlier Detection for Categorical Data.

Xianyu Hu,Yijie Wang,Li Cheng
DOI: https://doi.org/10.1109/ijcnn.2019.8852383
2019-01-01
Abstract:Outlier detection for categorical data is very important in many practical scenarios, such as intrusion detection, fraud detection, early detection of diseases, etc. However, there is no inherent difference measure for categorical data. The differences are hidden in complex attribute value relationships. Existing methods do not properly handle the internal relationship and external relationship of attributes, resulting in low accuracy of outlier detection. This paper proposes a novel unsupervised outlier detection method for categorical data based on Multi-Hierarchy Attribute Relationship Mining (MHARM). It detects outliers by mining the hierarchical and complex relationships between attribute values. MHARM first calculates the internal relationship. It processes each attribute independently via an information-theoretic difference to get an internal distance matrix. Then it handles different subhierarchy of external relationship. It divides attributes into two clusters, using mutual information as the correlation measure. For the external relationship of intra-cluster attributes, it iteratively updates an external distance matrix by using an entropy weighted Earth Mover's Distance (EMD) and the internal distance until convergence; for the external relationship of inter-cluster attributes, the joint entropy weighted sum is obtained to be the whole difference between objects. Finally, MHARM uses the sum of whole difference between objects as the outlier score, sorting it for outlier detection. Experimental results show that MHARM has an average AUC value of 13.84% higher than the state-of-the-art methods and significantly reduced the detection volume multiples (dvM) for unearthing 90% outliers on the given nine data sets.
What problem does this paper attempt to address?