A Fast Algorithm for Density-based Top-n Local Outlier Detection

Fang LIU,Jian-Peng QI,Yan-Wei YU,Lei CAO,Jin-Dong ZHAO
DOI: https://doi.org/10.16383/j.aas.c180425
2019-01-01
ACTA AUTOMATICA SINICA
Abstract:Local outlier factor (LOF) effectively addresses the problem of outlier detection in skewed datasets, which has been shown remarkable detection performance in variety of applications. In this paper, we propose an efficient Top-n local outlier detection algorithm, called MTLOF (Multi-granularity upper bound pruning based top-n LOF detection), for fast detecting top-n local outliers in large-scale datasets. First, we propose four LOF upper bounds that are closer to the real LOF value to avoid the computations of LOF values, and analyze their computational complexity theoretically. Second, by combining with index structure and the upper bounds UB1 and UB2, we propose a two-layer Cell pruning strategy, not only adopting the global Cell pruning strategy, but also introducing a local pruning strategy based on the internal data objects of Cells, which effectively prunes the high-density regions. Third, we propose two more reasonable and effective data object pruning strategies using the proposed upper bounds UB3 and UB4. UB3 and UB4 are closer to the real LOF value, which benefits to pruning more data objects. On the other hand, the upper bound calculation method based on computation reuse greatly reduces the computational cost. Finally, we optimize the selecting method of initial Top-n local outliers leveraging the established index structure. Specifically, we select the initial Top-n local outliers in sparse regions, which is conducive to selecting the data objects with a larger LOF value as the initial local outliers. Experimental study on six real-world datasets demonstrates the efficiency and scalability of our proposed MTLOF— up to 3.5 times faster than the state-of-the-art TOLF (Top-n LOF) method.
What problem does this paper attempt to address?