Parallel Outlier Detection Using KD-Tree Based on MapReduce

Qing He,Yunlong Ma,Qun Wang,Fuzhen Zhuang,Zhongzhi Shi
DOI: https://doi.org/10.1109/CloudCom.2011.20
2011-01-01
Abstract:Distributed and Parallel algorithms have attracted a vast amount of interest and research in recent decades, to handle large-scale data set in real-world applications. In this paper, we focus on a parallel implementation of KD-Tree based outlier detection method to deal with large-scale data set. As one of the state-of-the-art outlier detection methods, KD-Tree based has been approved to be an effective algorithm. However, it still cannot process large-scale data set efficiently due to its serial implementation. Based on the current and powerful parallel programming framework--MapReduce, we propose to implement the parallel KD-Tree based outlier detection algorithm (e.g., PKDTree for short). Experimental results demonstrate the efficiency of PKDTree according to the evaluation criterions of scale up, speedup and size up.
What problem does this paper attempt to address?